X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/c67bbae06fa560d08982cd38476ff29fb39ec78d..4350c9a7a6da0a61235d99723a34e65aefb57ffd:/pod/perlfunc.pod diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index d62f61a..f10927e 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -457,15 +457,32 @@ binary and text files. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. Returns true on success, otherwise it returns C and sets C<$!> (errno). +On some systems (in general, DOS and Windows-based systems) binmode() +is necessary when you're not working with a text file. For the sake +of portability it is a good idea to always use it when appropriate, +and to never use it when it isn't appropriate. Also, people can +set their I/O to be by default UTF-8 encoded Unicode, not bytes. + +In other words: regardless of platform, use binmode() on binary data, +like for example images. + +If LAYER is present it is a single string, but may contain multiple +directives. The directives alter the behaviour of the file handle. +When LAYER is present using binmode on text file makes sense. + If LAYER is omitted or specified as C<:raw> the filehandle is made suitable for passing binary data. This includes turning off possible CRLF translation and marking it as bytes (as opposed to Unicode characters). -Note that as desipite what may be implied in I<"Programming Perl"> +Note that as despite what may be implied in I<"Programming Perl"> (the Camel) or elsewhere C<:raw> is I the simply inverse of C<:crlf> -- other layers which would affect binary nature of the stream are I disabled. See L, L and the discussion about the PERLIO environment variable. +The C<:bytes>, C<:crlf>, and C<:utf8>, and any other directives of the +form C<:...>, are called I/O I. The C pragma can be used to +establish default I/O layers. See L. + I -On some systems (in general, DOS and Windows-based systems) binmode() -is necessary when you're not working with a text file. For the sake -of portability it is a good idea to always use it when appropriate, -and to never use it when it isn't appropriate. - -In other words: regardless of platform, use binmode() on binary files -(like for example images). - -If LAYER is present it is a single string, but may contain -multiple directives. The directives alter the behaviour of the -file handle. When LAYER is present using binmode on text -file makes sense. - To mark FILEHANDLE as UTF-8, use C<:utf8>. -The C<:bytes>, C<:crlf>, and C<:utf8>, and any other directives of the -form C<:...>, are called I/O I. The C pragma can be used to -establish default I/O layers. See L. - In general, binmode() should be called after open() but before any I/O is done on the filehandle. Calling binmode() will normally flush any pending buffered output data (and perhaps pending input data) on the handle. An exception to this is the C<:encoding> layer that changes the default character encoding of the handle, see L. The C<:encoding> layer sometimes needs to be called in -mid-stream, and it doesn't flush the stream. +mid-stream, and it doesn't flush the stream. The C<:encoding> +also implicitly pushes on top of itself the C<:utf8> layer because +internally Perl will operate on UTF-8 encoded Unicode characters. The operating system, device drivers, C libraries, and Perl run-time system all work together to let the programmer treat a single @@ -730,10 +732,14 @@ chr(0x263a) is a Unicode smiley face. Note that characters from 127 to 255 (inclusive) are by default not encoded in Unicode for backward compatibility reasons (but see L). +If NUMBER is omitted, uses C<$_>. + For the reverse, use L. -See L and L for more about Unicode. -If NUMBER is omitted, uses C<$_>. +Note that under the C pragma the NUMBER is masked to +the low eight bits. + +See L and L for more about Unicode. =item chroot FILENAME @@ -870,7 +876,10 @@ different strings. When choosing a new salt create a random two character string whose characters come from the set C<[./0-9A-Za-z]> (like C). +'/', 0..9, 'A'..'Z', 'a'..'z')[rand 64, rand 64]>). This set of +characters is just a recommendation; the characters allowed in +the salt depend solely on your system's crypt library, and Perl can't +restrict what salts C accepts. Here's an example that makes sure that whoever runs this program knows their own password: @@ -1116,7 +1125,7 @@ This is useful for propagating exceptions: If LIST is empty and C<$@> contains an object reference that has a C method, that method will be called with additional file and line number parameters. The return value replaces the value in -C<$@>. ie. as if C<<$@ = eval { $@->PROPAGATE(__FILE__, __LINE__) };>> +C<$@>. ie. as if C<< $@ = eval { $@->PROPAGATE(__FILE__, __LINE__) }; >> were called. If C<$@> is empty then the string C<"Died"> is used. @@ -1258,9 +1267,11 @@ it. When called in scalar context, returns only the key for the next element in the hash. Entries are returned in an apparently random order. The actual random -order is subject to change in future versions of perl, but it is guaranteed -to be in the same order as either the C or C function -would produce on the same (unmodified) hash. +order is subject to change in future versions of perl, but it is +guaranteed to be in the same order as either the C or C +function would produce on the same (unmodified) hash. Since Perl +5.8.1 the ordering is different even between different runs of Perl +for security reasons (see L). When the hash is entirely read, a null array is returned in list context (which when assigned produces a false (C<0>) value), and C in @@ -1326,12 +1337,11 @@ last file. Examples: # insert dashes just before last line of last file while (<>) { - if (eof()) { # check for end of current file + if (eof()) { # check for end of last file print "--------------\n"; - close(ARGV); # close or last; is needed if we - # are reading from the terminal } print; + last if eof(); # needed if we're reading from a terminal } Practical hint: you almost never need to use C in Perl, because the @@ -2303,13 +2313,19 @@ first argument. Compare L. =item keys HASH -Returns a list consisting of all the keys of the named hash. (In -scalar context, returns the number of keys.) The keys are returned in -an apparently random order. The actual random order is subject to -change in future versions of perl, but it is guaranteed to be the same -order as either the C or C function produces (given -that the hash has not been modified). As a side effect, it resets -HASH's iterator. +Returns a list consisting of all the keys of the named hash. +(In scalar context, returns the number of keys.) + +The keys are returned in an apparently random order. The actual +random order is subject to change in future versions of perl, but it +is guaranteed to be the same order as either the C or C +function produces (given that the hash has not been modified). Since +Perl 5.8.1 the ordering is different even between different runs of +Perl for security reasons (see L). + +As a side effect, calling keys() resets the HASH's internal iterator, +see L. Here is yet another way to print your environment: @@ -2362,7 +2378,7 @@ same as the number actually killed). kill 9, @goners; If SIGNAL is zero, no signal is sent to the process. This is a -useful way to check that the process is alive and hasn't changed +useful way to check that a child process is alive and hasn't changed its UID. See L for notes on the portability of this construct. @@ -2370,7 +2386,9 @@ Unlike in the shell, if SIGNAL is negative, it kills process groups instead of processes. (On System V, a negative I number will also kill process groups, but that's not portable.) That means you usually want to use positive not negative signals. You may also -use a signal name in quotes. See L for details. +use a signal name in quotes. + +See L for more details. =item last LABEL @@ -2424,11 +2442,15 @@ If EXPR is omitted, uses C<$_>. =item length -Returns the length in characters of the value of EXPR. If EXPR is +Returns the length in I of the value of EXPR. If EXPR is omitted, returns length of C<$_>. Note that this cannot be used on an entire array or hash to find out how many elements these have. For that, use C and C respectively. +Note the I: if the EXPR is in Unicode, you will get the +number of characters, not the number of bytes. To get the length +in bytes, use C, see L. + =item link OLDFILE,NEWFILE Creates a new filename linked to the old filename. Returns true for @@ -2859,9 +2881,7 @@ argument being C: opens a filehandle to an anonymous temporary file. Also using "+<" works for symmetry, but you really should consider writing something to the temporary file first. You will need to seek() to do the -reading. Starting from Perl 5.8.1 the temporary files are created -using the File::Temp module for greater portability, in Perl 5.8.0 the -mkstemp() system call (which has known bugs in some platforms) was used. +reading. File handles can be opened to "in memory" files held in Perl scalars via: @@ -3793,18 +3813,27 @@ with the wrong number of RANDBITS.) Attempts to read LENGTH I of data into variable SCALAR from the specified FILEHANDLE. Returns the number of characters actually read, C<0> at end of file, or undef if there was an error (in -the latter case C<$!> is also set). SCALAR will be grown or shrunk to -the length actually read. If SCALAR needs growing, the new bytes will -be zero bytes. An OFFSET may be specified to place the read data into -some other place in SCALAR than the beginning. The call is actually -implemented in terms of either Perl's or system's fread() call. To -get a true read(2) system call, see C. +the latter case C<$!> is also set). SCALAR will be grown or shrunk +so that the last character actually read is the last character of the +scalar after the read. + +An OFFSET may be specified to place the read data at some place in the +string other than the beginning. A negative OFFSET specifies +placement at that many characters counting backwards from the end of +the string. A positive OFFSET greater than the length of SCALAR +results in the string being padded to the required size with C<"\0"> +bytes before the result of the read is appended. + +The call is actually implemented in terms of either Perl's or system's +fread() call. To get a true read(2) system call, see C. Note the I: depending on the status of the filehandle, either (8-bit) bytes or characters are read. By default all filehandles operate on bytes, but for example if the filehandle has been opened with the C<:utf8> I/O layer (see L, and the C -pragma, L), the I/O will operate on characters, not bytes. +pragma, L), the I/O will operate on UTF-8 encoded Unicode +characters, not bytes. Similarly for the C<:encoding> pragma: +in that case pretty much any characters can be read. =item readdir DIRHANDLE @@ -3891,7 +3920,9 @@ Note the I: depending on the status of the socket, either (8-bit) bytes or characters are received. By default all sockets operate on bytes, but for example if the socket has been changed using binmode() to operate with the C<:utf8> I/O layer (see the C -pragma, L), the I/O will operate on characters, not bytes. +pragma, L), the I/O will operate on UTF-8 encoded Unicode +characters, not bytes. Similarly for the C<:encoding> pragma: +in that case pretty much any characters can be read. =item redo LABEL @@ -4056,6 +4087,15 @@ will complain about not finding "F" there. In this case you can do: eval "require $class"; +Now that you understand how C looks for files in the case of +a bareword argument, there is a little extra functionality going on +behind the scenes. Before C looks for a "F<.pm>" extension, +it will first look for a filename with a "F<.pmc>" extension. A file +with this extension is assumed to be Perl bytecode generated by +L. If this file is found, and it's modification +time is newer than a coinciding "F<.pm>" non-compiled file, it will be +loaded in place of that non-compiled file ending in a "F<.pm>" extension. + You can also insert hooks into the import facility, by putting directly Perl code into the @INC array. There are three forms of hooks: subroutine references, array references and blessed objects. @@ -4187,9 +4227,9 @@ last occurrence at or before that position. =item rmdir -Deletes the directory specified by FILENAME if that directory is empty. If it -succeeds it returns true, otherwise it returns false and sets C<$!> (errno). If -FILENAME is omitted, uses C<$_>. +Deletes the directory specified by FILENAME if that directory is +empty. If it succeeds it returns true, otherwise it returns false and +sets C<$!> (errno). If FILENAME is omitted, uses C<$_>. =item s/// @@ -4410,9 +4450,10 @@ L for examples. Note the I: depending on the status of the socket, either (8-bit) bytes or characters are sent. By default all sockets operate on bytes, but for example if the socket has been changed using -binmode() to operate with the C<:utf8> I/O layer (see L, or -the C pragma, L), the I/O will operate on characters, not -bytes. +binmode() to operate with the C<:utf8> I/O layer (see L, or the +C pragma, L), the I/O will operate on UTF-8 encoded +Unicode characters, not bytes. Similarly for the C<:encoding> pragma: +in that case pretty much any characters can be sent. =item setpgrp PID,PGRP @@ -4723,6 +4764,15 @@ inconsistent results (sometimes saying C<$x[1]> is less than C<$x[2]> and sometimes saying the opposite, for example) the results are not well-defined. +Because C<< <=> >> returns C when either operand is C +(not-a-number), and because C will trigger a fatal error unless the +result of a comparison is defined, when sorting with a comparison function +like C<< $a <=> $b >>, be careful about lists that might contain a C. +The following example takes advantage of the fact that C to +eliminate any Cs from the input. + + @result = sort { $a <=> $b } grep { $_ == $_ } @input; + =item splice ARRAY,OFFSET,LENGTH,LIST =item splice ARRAY,OFFSET,LENGTH @@ -4821,8 +4871,8 @@ The LIMIT parameter can be used to split a line partially ($login, $passwd, $remainder) = split(/:/, $_, 3); -When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT -one larger than the number of variables in the list, to avoid +When assigning to a list, if LIMIT is omitted, or zero, Perl supplies +a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need. @@ -5571,21 +5621,15 @@ See L for a kinder, gentler explanation of opening files. =item sysread FILEHANDLE,SCALAR,LENGTH -Attempts to read LENGTH I of data into variable SCALAR -from the specified FILEHANDLE, using the system call read(2). It -bypasses buffered IO, so mixing this with other kinds of reads, -C, C, C, C, or C can cause confusion -because stdio usually buffers data. Returns the number of characters -actually read, C<0> at end of file, or undef if there was an error (in -the latter case C<$!> is also set). SCALAR will be grown or shrunk so -that the last byte actually read is the last byte of the scalar after -the read. - -Note the I: depending on the status of the filehandle, -either (8-bit) bytes or characters are read. By default all -filehandles operate on bytes, but for example if the filehandle has -been opened with the C<:utf8> I/O layer (see L, and the C -pragma, L), the I/O will operate on characters, not bytes. +Attempts to read LENGTH bytes of data into variable SCALAR from the +specified FILEHANDLE, using the system call read(2). It bypasses +buffered IO, so mixing this with other kinds of reads, C, +C, C, C, or C can cause confusion because the +perlio or stdio layers usually buffers data. Returns the number of +bytes actually read, C<0> at end of file, or undef if there was an +error (in the latter case C<$!> is also set). SCALAR will be grown or +shrunk so that the last byte actually read is the last byte of the +scalar after the read. An OFFSET may be specified to place the read data at some place in the string other than the beginning. A negative OFFSET specifies @@ -5598,9 +5642,15 @@ There is no syseof() function, which is ok, since eof() doesn't work very well on device files (like ttys) anyway. Use sysread() and check for a return value for 0 to decide whether you're done. +Note that if the filehandle has been marked as C<:utf8> Unicode +characters are read instead of bytes (the LENGTH, OFFSET, and the +return value of sysread() are in Unicode characters). +The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. +See L, L, and the C pragma, L. + =item sysseek FILEHANDLE,POSITION,WHENCE -Sets FILEHANDLE's system position I using the system call +Sets FILEHANDLE's system position in bytes using the system call lseek(2). FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are C<0> to set the new position to POSITION, C<1> to set the it to the current position plus @@ -5612,7 +5662,7 @@ on characters (for example by using the C<:utf8> I/O layer), tell() will return byte offsets, not character offsets (because implementing that would render sysseek() very slow). -sysseek() bypasses normal buffered io, so mixing this with reads (other +sysseek() bypasses normal buffered IO, so mixing this with reads (other than C, for example >< or read()) C, C, C, C, or C may cause confusion. @@ -5691,27 +5741,27 @@ See L and L for details. =item syswrite FILEHANDLE,SCALAR -Attempts to write LENGTH characters of data from variable SCALAR to -the specified FILEHANDLE, using the system call write(2). If LENGTH -is not specified, writes whole SCALAR. It bypasses buffered IO, so +Attempts to write LENGTH bytes of data from variable SCALAR to the +specified FILEHANDLE, using the system call write(2). If LENGTH is +not specified, writes whole SCALAR. It bypasses buffered IO, so mixing this with reads (other than C, C, C, -C, C, or C may cause confusion because stdio usually -buffers data. Returns the number of characters actually written, or -C if there was an error (in this case the errno variable C<$!> -is also set). If the LENGTH is greater than the available data in the -SCALAR after the OFFSET, only as much data as is available will be -written. +C, C, or C may cause confusion because the perlio and +stdio layers usually buffers data. Returns the number of bytes +actually written, or C if there was an error (in this case the +errno variable C<$!> is also set). If the LENGTH is greater than the +available data in the SCALAR after the OFFSET, only as much data as is +available will be written. An OFFSET may be specified to write the data from some part of the string other than the beginning. A negative OFFSET specifies writing that many characters counting backwards from the end of the string. In the case the SCALAR is empty you can use OFFSET but only zero offset. -Note the I: depending on the status of the filehandle, -either (8-bit) bytes or characters are written. By default all -filehandles operate on bytes, but for example if the filehandle has -been opened with the C<:utf8> I/O layer (see L, and the open -pragma, L), the I/O will operate on characters, not bytes. +Note that if the filehandle has been marked as C<:utf8>, Unicode +characters are written instead of bytes (the LENGTH, OFFSET, and the +return value of syswrite() are in UTF-8 encoded Unicode characters). +The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. +See L, L, and the C pragma, L. =item tell FILEHANDLE @@ -5949,7 +5999,7 @@ string of octal digits. See also L, if all you have is a string. Undefines the value of EXPR, which must be an lvalue. Use only on a scalar value, an array (using C<@>), a hash (using C<%>), a subroutine -(using C<&>), or a typeglob (using <*>). (Saying C +(using C<&>), or a typeglob (using C<*>). (Saying C will probably not do what you expect on most predefined variables or DBM list values, so don't do that; see L.) Always returns the undefined value. You can omit the EXPR, in which case nothing is @@ -6186,12 +6236,18 @@ above.) =item values HASH -Returns a list consisting of all the values of the named hash. (In a -scalar context, returns the number of values.) The values are -returned in an apparently random order. The actual random order is -subject to change in future versions of perl, but it is guaranteed to -be the same order as either the C or C function would -produce on the same (unmodified) hash. +Returns a list consisting of all the values of the named hash. +(In a scalar context, returns the number of values.) + +The values are returned in an apparently random order. The actual +random order is subject to change in future versions of perl, but it +is guaranteed to be the same order as either the C or C +function would produce on the same (unmodified) hash. Since Perl +5.8.1 the ordering is different even between different runs of Perl +for security reasons (see L). + +As a side effect, calling values() resets the HASH's internal iterator, +see L. Note that the values are not copied, which means modifying them will modify the contents of the hash: @@ -6199,7 +6255,6 @@ modify the contents of the hash: for (values %hash) { s/foo/bar/g } # modifies %hash values for (@hash{keys %hash}) { s/foo/bar/g } # same -As a side effect, calling values() resets the HASH's internal iterator. See also C, C, and C. =item vec EXPR,OFFSET,BITS