X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/0533ae6f60432c9f45178df2a58a9b469f4e3464..32458de9a4322bd2e66c525d33720a42df7e0b56:/pod/perlfunc.pod diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 5c778f1..1e32cca 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -1541,10 +1541,9 @@ makes it spring into existence the first time that it is called; see L. Use of L|/defined EXPR> on aggregates (hashes and arrays) is -deprecated. It -used to report whether memory for that aggregate had ever been -allocated. This behavior may disappear in future versions of Perl. -You should instead use a simple test for size: +no longer supported. It used to report whether memory for that +aggregate had ever been allocated. You should instead use a simple +test for size: if (@an_array) { print "has array elements\n" } if (%a_hash) { print "has hash members\n" } @@ -1703,8 +1702,8 @@ produce, respectively /etc/games is no good, stopped at canasta line 123. If the output is empty and L|perlvar/$@> already contains a value -(typically from a previous eval) that value is reused after appending -C<"\t...propagated">. This is useful for propagating exceptions: +(typically from a previous L|/eval EXPR>) that value is reused after +appending C<"\t...propagated">. This is useful for propagating exceptions: eval { ... }; die unless $@ =~ /Expected exception/; @@ -4006,8 +4005,8 @@ Note that L|perlvar/$_> is an alias to the list value, so it can be used to modify the elements of the LIST. While this is useful and supported, it can cause bizarre results if the elements of LIST are not variables. Using a regular C loop for this purpose would be -clearer in most cases. See also L|/grep BLOCK LIST> for an -array composed of those items of the original list for which the BLOCK +clearer in most cases. See also L|/grep BLOCK LIST> for a +list composed of those items of the original list for which the BLOCK or EXPR evaluates to true. C<{> starts both hash references and blocks, so C could be either @@ -4350,7 +4349,7 @@ opens the UTF8-encoded file containing Unicode characters; see L. Note that if layers are specified in the three-argument form, then default layers stored in ${^OPEN} (see L; usually set by the L pragma or the switch C<-CioD>) are ignored. -Those layers will also be ignored if you specifying a colon with no name +Those layers will also be ignored if you specify a colon with no name following it. In that case the default layer for the operating system (:raw on Unix, :crlf on Windows) is used. @@ -4406,9 +4405,9 @@ argument being L|/undef EXPR>: open(my $tmp, "+>", undef) or die ... -opens a filehandle to an anonymous temporary file. Also using C<< +< >> -works for symmetry, but you really should consider writing something -to the temporary file first. You will need to +opens a filehandle to a newly created empty anonymous temporary file. +(This happens under any mode, which makes C<< +> >> the only useful and +sensible mode to use.) You will need to L|/seek FILEHANDLE,POSITION,WHENCE> to do the reading. Perl is built using PerlIO by default. Unless you've @@ -5604,7 +5603,9 @@ X X Returns the offset of where the last C search left off for the variable in question (L|perlvar/$_> is used when the variable is not -specified). Note that 0 is a valid match offset. +specified). This offset is in characters unless the +(no-longer-recommended) L|bytes> pragma is in effect, in +which case the offset is in bytes. Note that 0 is a valid match offset. L|/undef EXPR> indicates that the search position is reset (usually due to match failure, but can also be because no match has yet been run on the scalar). @@ -6648,12 +6649,13 @@ C, and C (start of the file, current position, end of the file) from the L module. Returns C<1> on success, false otherwise. -Note the I: even if the filehandle has been set to -operate on characters (for example by using the C<:encoding(utf8)> open -layer), L|/tell FILEHANDLE> will return byte offsets, not -character offsets (because implementing that would render -L|/seek FILEHANDLE,POSITION,WHENCE> and -L|/tell FILEHANDLE> rather slow). +Note the emphasis on bytes: even if the filehandle has been set to operate +on characters (for example using the C<:encoding(utf8)> I/O layer), the +L|/seek FILEHANDLE,POSITION,WHENCE>, +L|/tell FILEHANDLE>, and +L|/sysseek FILEHANDLE,POSITION,WHENCE> +family of functions use byte offsets, not character offsets, +because seeking to a character offset would be very slow in a UTF-8 file. If you want to position the file for L|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> or @@ -7403,6 +7405,8 @@ X Splits the string EXPR into a list of strings and returns the list in list context, or the size of the list in scalar context. +(Prior to Perl 5.11, it also overwrote C<@_> with the list in +void and scalar context. If you target old perls, beware.) If only PATTERN is given, EXPR defaults to L|perlvar/$_>. @@ -8171,68 +8175,16 @@ X =item study -=for Pod::Functions optimize input data for repeated searches - -B - -May take extra time to study SCALAR (L|perlvar/$_> if unspecified) -in anticipation -of doing many pattern matches on the string before it is next modified. -This may or may not save time, depending on the nature and number of -patterns you are searching and the distribution of character -frequencies in the string to be searched; you probably want to compare -run times with and without it to see which is faster. Those loops -that scan for many short constant strings (including the constant -parts of more complex patterns) will benefit most. +=for Pod::Functions no-op, formerly optimized input data for repeated searches -(The way L|/study SCALAR> used to work is this: a linked list -of every -character in the string to be searched is made, so we know, for -example, where all the C<'k'> characters are. From each search string, -the rarest character is selected, based on some static frequency tables -constructed from some C programs and English text. Only those places -that contain this "rarest" character are examined.) +At this time, C does nothing. This may change in the future. -For example, here is a loop that inserts index producing entries -before any line containing a certain pattern: - - while (<>) { - study; - print ".IX foo\n" if /\bfoo\b/; - print ".IX bar\n" if /\bbar\b/; - print ".IX blurfl\n" if /\bblurfl\b/; - # ... - print; - } +Prior to Perl version 5.16, it would create an inverted index of all characters +that occurred in the given SCALAR (or L|perlvar/$_> if unspecified). When +matching a pattern, the rarest character from the pattern would be looked up in +this index. Rarity was based on some static frequency tables constructed from +some C programs and English text. -In searching for C, only locations in L|perlvar/$_> -that contain C -will be looked at, because C is rarer than C. In general, this is -a big win except in pathological cases. The only question is whether -it saves you more time than it took to build the linked list in the -first place. - -Note that if you have to look for strings that you don't know till -runtime, you can build an entire loop as a string and L|/eval -EXPR> that to avoid recompiling all your patterns all the time. -Together with undefining L>|perlvar/$E> to input entire -files as one record, this can be quite -fast, often faster than specialized programs like L. The following -scans a list of files (C<@files>) for a list of words (C<@words>), and prints -out the names of those files that contain a match: - - my $search = 'local $/; while (<>) { study;'; - foreach my $word (@words) { - $search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n"; - } - $search .= "}"; - @ARGV = @files; - my %seen; - eval $search; # this screams - foreach my $file (sort keys(%seen)) { - print $file, "\n"; - } =item sub NAME BLOCK X @@ -8531,17 +8483,19 @@ X X =for Pod::Functions +5.004 position I/O pointer on handle used with sysread and syswrite -Sets FILEHANDLE's system position in bytes using L. FILEHANDLE may +Sets FILEHANDLE's system position I using L. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are C<0> to set the new position to POSITION; C<1> to set the it to the current position plus POSITION; and C<2> to set it to EOF plus POSITION, typically negative. -Note the I: even if the filehandle has been set to operate -on characters (for example by using the C<:encoding(utf8)> I/O layer), -L|/tell FILEHANDLE> will return byte offsets, not character -offsets (because implementing that would render -L|/sysseek FILEHANDLE,POSITION,WHENCE> unacceptably slow). +Note the emphasis on bytes: even if the filehandle has been set to operate +on characters (for example using the C<:encoding(utf8)> I/O layer), the +L|/seek FILEHANDLE,POSITION,WHENCE>, +L|/tell FILEHANDLE>, and +L|/sysseek FILEHANDLE,POSITION,WHENCE> +family of functions use byte offsets, not character offsets, +because seeking to a character offset would be very slow in a UTF-8 file. L|/sysseek FILEHANDLE,POSITION,WHENCE> bypasses normal buffered IO, so mixing it with reads other than @@ -8702,19 +8656,21 @@ error. FILEHANDLE may be an expression whose value gives the name of the actual filehandle. If FILEHANDLE is omitted, assumes the file last read. -Note the I: even if the filehandle has been set to -operate on characters (for example by using the C<:encoding(utf8)> open -layer), L|/tell FILEHANDLE> will return byte offsets, not -character offsets (because that would render -L|/seek FILEHANDLE,POSITION,WHENCE> and -L|/tell FILEHANDLE> rather slow). +Note the emphasis on bytes: even if the filehandle has been set to operate +on characters (for example using the C<:encoding(utf8)> I/O layer), the +L|/seek FILEHANDLE,POSITION,WHENCE>, +L|/tell FILEHANDLE>, and +L|/sysseek FILEHANDLE,POSITION,WHENCE> +family of functions use byte offsets, not character offsets, +because seeking to a character offset would be very slow in a UTF-8 file. The return value of L|/tell FILEHANDLE> for the standard streams like the STDIN depends on the operating system: it may return -1 or something else. L|/tell FILEHANDLE> on pipes, fifos, and sockets usually returns -1. -There is no C function. Use C for that. +There is no C function. Use +L|/sysseek FILEHANDLE,POSITION,WHENCE> for that. Do not use L|/tell FILEHANDLE> (or other buffered I/O operations) on a filehandle that has been manipulated by