X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/5cc917d61a1b0b6683ece694d00cdb1abdf9c0d9..904028df2142182d347e16fc663545daf1b31fd8:/pod/perlfunc.pod diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 2035795..94bc8d7 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -12,36 +12,36 @@ following comma. (See the precedence table in L.) List operators take more than one argument, while unary operators can never take more than one argument. Thus, a comma terminates the argument of a unary operator, but merely separates the arguments of a list -operator. A unary operator generally provides a scalar context to its +operator. A unary operator generally provides scalar context to its argument, while a list operator may provide either scalar or list -contexts for its arguments. If it does both, the scalar arguments will -be first, and the list argument will follow. (Note that there can ever -be only one such list argument.) For instance, splice() has three scalar +contexts for its arguments. If it does both, scalar arguments +come first and list argument follow, and there can only ever +be one such list argument. For instance, splice() has three scalar arguments followed by a list, whereas gethostbyname() has four scalar arguments. In the syntax descriptions that follow, list operators that expect a -list (and provide list context for the elements of the list) are shown +list (and provide list context for elements of the list) are shown with LIST as an argument. Such a list may consist of any combination of scalar arguments or list values; the list values will be included in the list as if each individual element were interpolated at that point in the list, forming a longer single-dimensional list value. -Commas should separate elements of the LIST. +Commas should separate literal elements of the LIST. Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the -parentheses.) If you use the parentheses, the simple (but occasionally -surprising) rule is this: It I like a function, therefore it I a +parentheses.) If you use parentheses, the simple but occasionally +surprising rule is this: It I like a function, therefore it I a function, and precedence doesn't matter. Otherwise it's a list -operator or unary operator, and precedence does matter. And whitespace -between the function and left parenthesis doesn't count--so you need to -be careful sometimes: +operator or unary operator, and precedence does matter. Whitespace +between the function and left parenthesis doesn't count, so sometimes +you need to be careful: - print 1+2+4; # Prints 7. - print(1+2) + 4; # Prints 3. - print (1+2)+4; # Also prints 3! - print +(1+2)+4; # Prints 7. - print ((1+2)+4); # Prints 7. + print 1+2+4; # Prints 7. + print(1+2) + 4; # Prints 3. + print (1+2)+4; # Also prints 3! + print +(1+2)+4; # Prints 7. + print ((1+2)+4); # Prints 7. If you run Perl with the B<-w> switch it can warn you about this. For example, the third line above produces: @@ -55,14 +55,14 @@ and C. For example, C always means C. For functions that can be used in either a scalar or list context, -nonabortive failure is generally indicated in a scalar context by -returning the undefined value, and in a list context by returning the -null list. +nonabortive failure is generally indicated in scalar context by +returning the undefined value, and in list context by returning the +empty list. Remember the following important rule: There is B that relates the behavior of an expression in list context to its behavior in scalar context, or vice versa. It might do two totally different things. -Each operator and function decides which sort of value it would be most +Each operator and function decides which sort of value would be most appropriate to return in scalar context. Some operators return the length of the list that would have been returned in list context. Some operators return the first value in the list. Some operators return the @@ -78,14 +78,22 @@ the context at compile time. It would generate the scalar comma operator there, not the list construction version of the comma. That means it was never a list to start with. -In general, functions in Perl that serve as wrappers for system calls -of the same name (like chown(2), fork(2), closedir(2), etc.) all return +In general, functions in Perl that serve as wrappers for system calls ("syscalls") +of the same name (like chown(2), fork(2), closedir(2), etc.) return true when they succeed and C otherwise, as is usually mentioned in the descriptions below. This is different from the C interfaces, -which return C<-1> on failure. Exceptions to this rule are C, +which return C<-1> on failure. Exceptions to this rule include C, C, and C. System calls also set the special C<$!> variable on failure. Other functions do not, except accidentally. +Extension modules can also hook into the Perl parser to define new +kinds of keyword-headed expression. These may look like functions, but +may also look completely different. The syntax following the keyword +is defined entirely by the extension. If you are an implementor, see +L for the mechanism. If you are using such +a module, see the module's documentation for details of the syntax that +it defines. + =head2 Perl Functions by Category X @@ -117,7 +125,7 @@ C, C, C =item Functions for real @ARRAYs X -C, C, C, C, C +C, C, C, C, C, C, C, C =item Functions for list data X @@ -138,7 +146,7 @@ C, C, C, C, C, C and low-level POSIX tty-handling operations. If FILEHANDLE is an expression, the value is taken as an indirect filehandle, generally its name. @@ -1877,28 +1999,24 @@ You can use this to find out whether two handles refer to the same underlying descriptor: if (fileno(THIS) == fileno(THAT)) { - print "THIS and THAT are dups\n"; + print "THIS and THAT are dups\n"; } -(Filehandles connected to memory objects via new features of C may -return undefined even though they are open.) - - =item flock FILEHANDLE,OPERATION X X X Calls flock(2), or an emulation of it, on FILEHANDLE. Returns true for success, false on failure. Produces a fatal error if used on a machine that doesn't implement flock(2), fcntl(2) locking, or lockf(3). -C is Perl's portable file locking interface, although it locks -only entire files, not records. +C is Perl's portable file-locking interface, although it locks +entire files only, not records. Two potentially non-obvious but traditional C semantics are that it waits indefinitely until the lock is granted, and that its locks -B. Such discretionary locks are more flexible, but offer -fewer guarantees. This means that programs that do not also use C -may modify files locked with C. See L, -your port's specific documentation, or your system-specific local manpages +are B. Such discretionary locks are more flexible, but +offer fewer guarantees. This means that programs that do not also use +C may modify files locked with C. See L, +your port's specific documentation, and your system-specific local manpages for details. It's best to assume traditional behavior if you're writing portable programs. (But if you're not, you should as always feel perfectly free to write for your own system's idiosyncrasies (sometimes called @@ -1907,12 +2025,12 @@ in the way of your getting your job done.) OPERATION is one of LOCK_SH, LOCK_EX, or LOCK_UN, possibly combined with LOCK_NB. These constants are traditionally valued 1, 2, 8 and 4, but -you can use the symbolic names if you import them from the Fcntl module, -either individually, or as a group using the ':flock' tag. LOCK_SH +you can use the symbolic names if you import them from the L module, +either individually, or as a group using the C<:flock> tag. LOCK_SH requests a shared lock, LOCK_EX requests an exclusive lock, and LOCK_UN releases a previously requested lock. If LOCK_NB is bitwise-or'ed with -LOCK_SH or LOCK_EX then C will return immediately rather than blocking -waiting for the lock (check the return status to see if you got it). +LOCK_SH or LOCK_EX, then C returns immediately rather than blocking +waiting for the lock; check the return status to see if you got it. To avoid the possibility of miscoordination, Perl now flushes FILEHANDLE before locking or unlocking it. @@ -1932,38 +2050,40 @@ network; you would need to use the more system-specific C for that. If you like you can force Perl to ignore your system's flock(2) function, and so provide its own fcntl(2)-based emulation, by passing the switch C<-Ud_flock> to the F program when you configure -perl. +and build a new Perl. Here's a mailbox appender for BSD systems. use Fcntl qw(:flock SEEK_END); # import LOCK_* and SEEK_END constants sub lock { - my ($fh) = @_; - flock($fh, LOCK_EX) or die "Cannot lock mailbox - $!\n"; + my ($fh) = @_; + flock($fh, LOCK_EX) or die "Cannot lock mailbox - $!\n"; - # and, in case someone appended while we were waiting... - seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n"; + # and, in case someone appended while we were waiting... + seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n"; } sub unlock { - my ($fh) = @_; - flock($fh, LOCK_UN) or die "Cannot unlock mailbox - $!\n"; + my ($fh) = @_; + flock($fh, LOCK_UN) or die "Cannot unlock mailbox - $!\n"; } open(my $mbox, ">>", "/usr/spool/mail/$ENV{'USER'}") - or die "Can't open mailbox: $!"; + or die "Can't open mailbox: $!"; lock($mbox); print $mbox $msg,"\n\n"; unlock($mbox); -On systems that support a real flock(), locks are inherited across fork() -calls, whereas those that must resort to the more capricious fcntl() -function lose the locks, making it harder to write servers. +On systems that support a real flock(2), locks are inherited across fork() +calls, whereas those that must resort to the more capricious fcntl(2) +function lose their locks, making it seriously harder to write servers. See also L for other flock() examples. +Portability issues: L. + =item fork X X X @@ -1976,11 +2096,11 @@ fork(), great care has gone into making it extremely efficient (for example, using copy-on-write technology on data pages), making it the dominant paradigm for multitasking over the last few decades. -Beginning with v5.6.0, Perl will attempt to flush all files opened for +Beginning with v5.6.0, Perl attempts to flush all files opened for output before forking the child process, but this may not be supported on some platforms (see L). To be safe, you may need to set C<$|> ($AUTOFLUSH in English) or call the C method of -C on any open handles in order to avoid duplicate output. +C on any open handles to avoid duplicate output. If you C without ever waiting on your children, you will accumulate zombies. On some systems, you can avoid this by setting @@ -1993,6 +2113,14 @@ if you exit, then the remote server (such as, say, a CGI script or a backgrounded job launched from a remote shell) won't think you're done. You should reopen those to F if it's any issue. +On some platforms such as Windows, where the fork() system call is not available, +Perl can be built to emulate fork() in the Perl interpreter. The emulation is designed to, +at the level of the Perl program, be as compatible as possible with the "Unix" fork(). +However it has limitations that have to be considered in code intended to be portable. +See L for more details. + +Portability issues: L. + =item format X @@ -2000,8 +2128,8 @@ Declare a picture format for use by the C function. For example: format Something = - Test: @<<<<<<<< @||||| @>>>>> - $str, $%, '$' . int($num) + Test: @<<<<<<<< @||||| @>>>>> + $str, $%, '$' . int($num) . $str = "widget"; @@ -2023,40 +2151,44 @@ C<$^A> are written to some filehandle. You could also read C<$^A> and then set C<$^A> back to C<"">. Note that a format typically does one C per line of form, but the C function itself doesn't care how many newlines are embedded in the PICTURE. This means -that the C<~> and C<~~> tokens will treat the entire PICTURE as a single line. +that the C<~> and C<~~> tokens treat the entire PICTURE as a single line. You may therefore need to use multiple formlines to implement a single -record format, just like the format compiler. +record format, just like the C compiler. Be careful if you put double quotes around the picture, because an C<@> character may be taken to mean the beginning of an array name. C always returns true. See L for other examples. +If you are trying to use this instead of C to capture the output, +you may find it easier to open a filehandle to a scalar +(C<< open $fh, ">", \$output >>) and write to that instead. + =item getc FILEHANDLE X X X X =item getc Returns the next character from the input file attached to FILEHANDLE, -or the undefined value at end of file, or if there was an error (in +or the undefined value at end of file or if there was an error (in the latter case C<$!> is set). If FILEHANDLE is omitted, reads from STDIN. This is not particularly efficient. However, it cannot be used by itself to fetch single characters without waiting for the user to hit enter. For that, try something more like: if ($BSD_STYLE) { - system "stty cbreak /dev/tty 2>&1"; + system "stty cbreak /dev/tty 2>&1"; } else { - system "stty", '-icanon', 'eol', "\001"; + system "stty", '-icanon', 'eol', "\001"; } $key = getc(STDIN); if ($BSD_STYLE) { - system "stty -cbreak /dev/tty 2>&1"; + system "stty -cbreak /dev/tty 2>&1"; } else { - system "stty", 'icanon', 'eol', '^@'; # ASCII null + system 'stty', 'icanon', 'eol', '^@'; # ASCII NUL } print "\n"; @@ -2065,25 +2197,28 @@ is left as an exercise to the reader. The C function can do this more portably on systems purporting POSIX compliance. See also the C -module from your nearest CPAN site; details on CPAN can be found on +module from your nearest CPAN site; details on CPAN can be found under L. =item getlogin X X This implements the C library function of the same name, which on most -systems returns the current login from F, if any. If null, -use C. +systems returns the current login from F, if any. If it +returns the empty string, use C. $login = getlogin || getpwuid($<) || "Kilroy"; Do not consider C for authentication: it is not as secure as C. +Portability issues: L. + =item getpeername SOCKET X X -Returns the packed sockaddr address of other end of the SOCKET connection. +Returns the packed sockaddr address of the other end of the SOCKET +connection. use Socket; $hersockaddr = getpeername(SOCK); @@ -2097,10 +2232,12 @@ X X Returns the current process group for the specified PID. Use a PID of C<0> to get the current process group for the current process. Will raise an exception if used on a machine that -doesn't implement getpgrp(2). If PID is omitted, returns process -group of current process. Note that the POSIX version of C +doesn't implement getpgrp(2). If PID is omitted, returns the process +group of the current process. Note that the POSIX version of C does not accept a PID argument, so only C is truly portable. +Portability issues: L. + =item getppid X X X @@ -2108,18 +2245,22 @@ Returns the process id of the parent process. Note for Linux users: on Linux, the C functions C and C return different values from different threads. In order to -be portable, this behavior is not reflected by the perl-level function +be portable, this behavior is not reflected by the Perl-level function C, that returns a consistent value across threads. If you want to call the underlying C, you may use the CPAN module C. +Portability issues: L. + =item getpriority WHICH,WHO X X X Returns the current priority for a process, a process group, or a user. -(See C.) Will raise a fatal exception if used on a +(See L.) Will raise a fatal exception if used on a machine that doesn't implement getpriority(2). +Portability issues: L. + =item getpwnam NAME X X X X X X X X X X @@ -2186,8 +2327,8 @@ X X X =item endservent -These routines perform the same functions as their counterparts in the -system library. In list context, the return values from the +These routines are the same as their counterparts in the +system C library. In list context, the return values from the various get routines are as follows: ($name,$passwd,$uid,$gid, @@ -2198,7 +2339,7 @@ various get routines are as follows: ($name,$aliases,$proto) = getproto* ($name,$aliases,$port,$proto) = getserv* -(If the entry doesn't exist you get a null list.) +(If the entry doesn't exist you get an empty list.) The exact meaning of the $gcos field varies but it usually contains the real name of the user (as opposed to the login name) and other @@ -2206,7 +2347,7 @@ information pertaining to the user. Beware, however, that in many system users are able to change this information and therefore it cannot be trusted and therefore the $gcos is tainted (see L). The $passwd and $shell, user's encrypted password and -login shell, are also tainted, because of the same reason. +login shell, are also tainted, for the same reason. In scalar context, you get the name, unless the function was a lookup by name, in which case you get the other thing, whatever it is. @@ -2221,7 +2362,7 @@ lookup by name, in which case you get the other thing, whatever it is. #etc. In I the fields $quota, $comment, and $expire are special -cases in the sense that in many systems they are unsupported. If the +in that they are unsupported on many systems. If the $quota is unsupported, it is an empty scalar. If it is supported, it usually encodes the disk quota. If the $comment field is unsupported, it is an empty scalar. If it is supported it usually encodes some @@ -2230,26 +2371,26 @@ field may be $change or $age, fields that have to do with password aging. In some systems the $comment field may be $class. The $expire field, if present, encodes the expiration period of the account or the password. For the availability and the exact meaning of these fields -in your system, please consult your getpwnam(3) documentation and your +in your system, please consult getpwnam(3) and your system's F file. You can also find out from within Perl what your $quota and $comment fields mean and whether you have the $expire field by using the C module and the values C, C, C, C, and C. Shadow password -files are only supported if your vendor has implemented them in the +files are supported only if your vendor has implemented them in the intuitive fashion that calling the regular C library routines gets the shadow versions if you're running under privilege or if there exists the shadow(3) functions as found in System V (this includes Solaris -and Linux.) Those systems that implement a proprietary shadow password +and Linux). Those systems that implement a proprietary shadow password facility are unlikely to be supported. -The $members value returned by I is a space separated list of +The $members value returned by I is a space-separated list of the login names of the members of the group. For the I functions, if the C variable is supported in C, it will be returned to you via C<$?> if the function call fails. The -C<@addrs> value returned by a successful call is a list of the raw -addresses returned by the corresponding system library call. In the -Internet domain, each address is four bytes long and you can unpack it +C<@addrs> value returned by a successful call is a list of raw +addresses returned by the corresponding library call. In the +Internet domain, each address is four bytes long; you can unpack it by saying something like: ($a,$b,$c,$d) = unpack('W4',$addr[0]); @@ -2287,10 +2428,12 @@ for each field. For example: use User::pwent; $is_his = (stat($filename)->uid == pwent($whoever)->uid); -Even though it looks like they're the same method calls (uid), +Even though it looks as though they're the same method calls (uid), they aren't, because a C object is different from a C object. +Portability issues: L to L. + =item getsockname SOCKET X @@ -2315,27 +2458,52 @@ C module) will exist. To query options at another level the protocol number of the appropriate protocol controlling the option should be supplied. For example, to indicate that an option is to be interpreted by the TCP protocol, LEVEL should be set to the protocol -number of TCP, which you can get using getprotobyname. +number of TCP, which you can get using C. -The call returns a packed string representing the requested socket option, -or C if there is an error (the error reason will be in $!). What -exactly is in the packed string depends in the LEVEL and OPTNAME, consult -your system documentation for details. A very common case however is that -the option is an integer, in which case the result will be a packed -integer which you can decode using unpack with the C (or C) format. +The function returns a packed string representing the requested socket +option, or C on error, with the reason for the error placed in +C<$!>. Just what is in the packed string depends on LEVEL and OPTNAME; +consult getsockopt(2) for details. A common case is that the option is an +integer, in which case the result is a packed integer, which you can decode +using C with the C (or C) format. -An example testing if Nagle's algorithm is turned on on a socket: +Here's an example to test whether Nagle's algorithm is enabled on a socket: use Socket qw(:all); defined(my $tcp = getprotobyname("tcp")) - or die "Could not determine the protocol number for tcp"; + or die "Could not determine the protocol number for tcp"; # my $tcp = IPPROTO_TCP; # Alternative my $packed = getsockopt($socket, $tcp, TCP_NODELAY) - or die "Could not query TCP_NODELAY socket option: $!"; + or die "getsockopt TCP_NODELAY: $!"; my $nodelay = unpack("I", $packed); print "Nagle's algorithm is turned ", $nodelay ? "off\n" : "on\n"; +Portability issues: L. + +=item given EXPR BLOCK +X + +=item given BLOCK + +C is analogous to the C keyword in other languages. C +and C are used in Perl to implement C/C like statements. +Only available after Perl 5.10. For example: + + use v5.10; + given ($fruit) { + when (/apples?/) { + print "I like apples." + } + when (/oranges?/) { + print "I don't like oranges." + } + default { + print "I don't like anything" + } + } + +See L for detailed information. =item glob EXPR X X X X @@ -2350,28 +2518,37 @@ implementing the C<< <*.c> >> operator, but you can use it directly. If EXPR is omitted, C<$_> is used. The C<< <*.c> >> operator is discussed in more detail in L. -Note that C will split its arguments on whitespace, treating -each segment as separate pattern. As such, C would -match all files with a F<.c> or F<.h> extension. The expression -C would match all files in the current working directory. +Note that C splits its arguments on whitespace and treats +each segment as separate pattern. As such, C +matches all files with a F<.c> or F<.h> extension. The expression +C matches all files in the current working directory. + +If non-empty braces are the only wildcard characters used in the +C, no filenames are matched, but potentially many strings +are returned. For example, this produces nine strings, one for +each pairing of fruits and colors: + + @many = glob "{apple,tomato,cherry}={green,yellow,red}"; Beginning with v5.6.0, this operator is implemented using the standard C extension. See L for details, including C which does not treat whitespace as a pattern separator. +Portability issues: L. + =item gmtime EXPR X X X =item gmtime -Works just like L but the returned values are +Works just like L but the returned values are localized for the standard Greenwich time zone. -Note: when called in list context, $isdst, the last value -returned by gmtime is always C<0>. There is no +Note: When called in list context, $isdst, the last value +returned by gmtime, is always C<0>. There is no Daylight Saving Time in GMT. -See L for portability concerns. +Portability issues: L. =item goto LABEL X X X @@ -2380,18 +2557,15 @@ X X X =item goto &NAME -The C form finds the statement labeled with LABEL and resumes -execution there. It may not be used to go into any construct that -requires initialization, such as a subroutine or a C loop. It -also can't be used to go into a construct that is optimized away, -or to get out of a block or subroutine given to C. -It can be used to go almost anywhere else within the dynamic scope, -including out of subroutines, but it's usually better to use some other -construct such as C or C. The author of Perl has never felt the -need to use this form of C (in Perl, that is--C is another matter). -(The difference being that C does not offer named loops combined with -loop control. Perl does, and this replaces most structured uses of C -in other languages.) +The C form finds the statement labeled with LABEL and +resumes execution there. It can't be used to get out of a block or +subroutine given to C. It can be used to go almost anywhere +else within the dynamic scope, including out of subroutines, but it's +usually better to use some other construct such as C or C. +The author of Perl has never felt the need to use this form of C +(in Perl, that is; C is another matter). (The difference is that C +does not offer named loops combined with loop control. Perl does, and +this replaces most structured uses of C in other languages.) The C form expects a label name, whose scope will be resolved dynamically. This allows for computed Cs per FORTRAN, but isn't @@ -2399,6 +2573,16 @@ necessarily recommended if you're optimizing for maintainability: goto ("FOO", "BAR", "GLARCH")[$i]; +As shown in this example, C is exempt from the "looks like a +function" rule. A pair of parentheses following it does not (necessarily) +delimit its argument. C is equivalent to C. + +Use of C or C to jump into a construct is +deprecated and will issue a warning. Even then, it may not be used to +go into any construct that requires initialization, such as a +subroutine or a C loop. It also can't be used to go into a +construct that is optimized away. + The C form is quite different from the other forms of C. In fact, it isn't a goto in the normal sense at all, and doesn't have the stigma associated with other gotos. Instead, it @@ -2412,7 +2596,7 @@ After the C, not even C will be able to tell that this routine was called first. NAME needn't be the name of a subroutine; it can be a scalar variable -containing a code reference, or a block that evaluates to a code +containing a code reference or a block that evaluates to a code reference. =item grep BLOCK LIST @@ -2445,7 +2629,7 @@ This is usually something to be avoided when writing clear code. If C<$_> is lexical in the scope where the C appears (because it has been declared with C) then, in addition to being locally aliased to -the list elements, C<$_> keeps being lexical inside the block; i.e. it +the list elements, C<$_> keeps being lexical inside the block; i.e., it can't be seen from the outside, avoiding any potential side-effects. See also L for a list composed of the results of the BLOCK or EXPR. @@ -2465,7 +2649,7 @@ L.) If EXPR is omitted, uses C<$_>. Hex strings may only represent integers. Strings that would cause integer overflow trigger a warning. Leading whitespace is not stripped, unlike oct(). To present something as hex, look into L, -L, or L. +L, and L. =item import LIST X @@ -2497,7 +2681,7 @@ X X X X X Returns the integer portion of EXPR. If EXPR is omitted, uses C<$_>. You should not use this function for rounding: one because it truncates -towards C<0>, and two because machine representations of floating point +towards C<0>, and two because machine representations of floating-point numbers can sometimes produce counterintuitive results. For example, C produces -268 rather than the correct -269; that's because it's really more like -268.99999999999994315658 instead. Usually, @@ -2509,14 +2693,14 @@ X Implements the ioctl(2) function. You'll probably first have to say - require "sys/ioctl.ph"; # probably in $Config{archlib}/sys/ioctl.ph + require "sys/ioctl.ph"; # probably in $Config{archlib}/sys/ioctl.ph to get the correct function definitions. If F doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as F<< >>. (There is a Perl script called B that comes with the Perl kit that may help you in this, but it's nontrivial.) SCALAR will be read and/or -written depending on the FUNCTION--a pointer to the string value of SCALAR +written depending on the FUNCTION; a C pointer to the string value of SCALAR will be passed as the third argument of the actual C call. (If SCALAR has no string value but does have a numeric value, that value will be passed rather than a pointer to the string value. To guarantee this to be @@ -2526,10 +2710,10 @@ C. The return value of C (and C) is as follows: - if OS returns: then Perl returns: - -1 undefined value - 0 string "0 but true" - anything else that number + if OS returns: then Perl returns: + -1 undefined value + 0 string "0 but true" + anything else that number Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating @@ -2541,6 +2725,8 @@ system: The special string C<"0 but true"> is exempt from B<-w> complaints about improper numeric conversions. +Portability issues: L. + =item join EXPR,LIST X @@ -2557,18 +2743,20 @@ X X =item keys ARRAY +=item keys EXPR + Returns a list consisting of all the keys of the named hash, or the indices of an array. (In scalar context, returns the number of keys or indices.) The keys of a hash are returned in an apparently random order. The actual -random order is subject to change in future versions of perl, but it +random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the C or C function produces (given that the hash has not been modified). Since -Perl 5.8.1 the ordering is different even between different runs of +Perl 5.8.1 the ordering can be different even between different runs of Perl for security reasons (see L). -As a side effect, calling keys() resets the HASH or ARRAY's internal iterator +As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see L). In particular, calling keys() in void context resets the iterator with no other overhead. @@ -2577,13 +2765,13 @@ Here is yet another way to print your environment: @keys = keys %ENV; @values = values %ENV; while (@keys) { - print pop(@keys), '=', pop(@values), "\n"; + print pop(@keys), '=', pop(@values), "\n"; } or how about sorted by key: foreach $key (sort(keys %ENV)) { - print $key, '=', $ENV{$key}, "\n"; + print $key, '=', $ENV{$key}, "\n"; } The returned values are copies of the original keys in the hash, so @@ -2593,10 +2781,10 @@ To sort a hash by value, you'll need to use a C function. Here's a descending numeric sort of a hash by its values: foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { - printf "%4d %s\n", $hash{$key}, $key; + printf "%4d %s\n", $hash{$key}, $key; } -As an lvalue C allows you to increase the number of hash buckets +Used as an lvalue, C allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say @@ -2612,7 +2800,15 @@ C in this way (but you needn't worry about doing this by accident, as trying has no effect). C in an lvalue context is a syntax error. -See also C, C and C. +Starting with Perl 5.14, C can take a scalar EXPR, which must contain +a reference to an unblessed hash or array. The argument will be +dereferenced automatically. This aspect of C is considered highly +experimental. The exact behaviour may change in a future version of Perl. + + for (keys $hashref) { ... } + for (keys $obj->get_arrayref) { ... } + +See also C, C, and C. =item kill SIGNAL, LIST X X @@ -2624,21 +2820,32 @@ same as the number actually killed). $cnt = kill 1, $child1, $child2; kill 9, @goners; -If SIGNAL is zero, no signal is sent to the process, but the kill(2) -system call will check whether it's possible to send a signal to it (that +If SIGNAL is zero, no signal is sent to the process, but C +checks whether it's I to send a signal to it (that means, to be brief, that the process is owned by the same user, or we are -the super-user). This is a useful way to check that a child process is +the super-user). This is useful to check that a child process is still alive (even if only as a zombie) and hasn't changed its UID. See L for notes on the portability of this construct. -Unlike in the shell, if SIGNAL is negative, it kills -process groups instead of processes. (On System V, a negative I -number will also kill process groups, but that's not portable.) That -means you usually want to use positive not negative signals. You may also -use a signal name in quotes. +Unlike in the shell, if SIGNAL is negative, it kills process groups instead +of processes. That means you usually want to use positive not negative signals. +You may also use a signal name in quotes. + +The behavior of kill when a I number is zero or negative depends on +the operating system. For example, on POSIX-conforming systems, zero will +signal the current process group and -1 will signal all processes. See L for more details. +On some platforms such as Windows where the fork() system call is not available. +Perl can be built to emulate fork() at the interpreter level. +This emulation has limitations related to kill that have to be considered, +for code running on Windows and in code intended to be portable. + +See L for more details. + +Portability issues: L. + =item last LABEL X X @@ -2650,12 +2857,12 @@ omitted, the command refers to the innermost enclosing loop. The C block, if any, is not executed: LINE: while () { - last LINE if /^$/; # exit when done with header - #... + last LINE if /^$/; # exit when done with header + #... } -C cannot be used to exit a block which returns a value such as -C, C or C, and should not be used to exit +C cannot be used to exit a block that returns a value such as +C, C, or C, and should not be used to exit a grep() or map() operation. Note that a block by itself is semantically identical to a loop @@ -2671,12 +2878,62 @@ X X =item lc Returns a lowercased version of EXPR. This is the internal function -implementing the C<\L> escape in double-quoted strings. Respects -current LC_CTYPE locale if C in force. See L -and L for more details about locale and Unicode support. +implementing the C<\L> escape in double-quoted strings. If EXPR is omitted, uses C<$_>. +What gets returned depends on several factors: + +=over + +=item If C is in effect: + +=over + +=item On EBCDIC platforms + +The results are what the C language system call C returns. + +=item On ASCII platforms + +The results follow ASCII semantics. Only characters C change, to C +respectively. + +=back + +=item Otherwise, If EXPR has the UTF8 flag set + +If the current package has a subroutine named C, it will be used to +change the case +(See L.) +Otherwise Unicode semantics are used for the case change. + +=item Otherwise, if C is in effect + +Respects current LC_CTYPE locale. See L. + +=item Otherwise, if C is in effect: + +Unicode semantics are used for the case change. Any subroutine named +C will be ignored. + +=item Otherwise: + +=over + +=item On EBCDIC platforms + +The results are what the C language system call C returns. + +=item On ASCII platforms + +ASCII semantics are used for the case change. The lowercase of any character +outside the ASCII range is the character itself. + +=back + +=back + =item lcfirst EXPR X X @@ -2684,30 +2941,30 @@ X X Returns the value of EXPR with the first character lowercased. This is the internal function implementing the C<\l> escape in -double-quoted strings. Respects current LC_CTYPE locale if C in force. See L and L for more -details about locale and Unicode support. +double-quoted strings. If EXPR is omitted, uses C<$_>. +This function behaves the same way under various pragmata, such as in a locale, +as L does. + =item length EXPR X X =item length Returns the length in I of the value of EXPR. If EXPR is -omitted, returns length of C<$_>. If EXPR is undefined, returns C. -Note that this cannot be used on an entire array or hash to find out how -many elements these have. For that, use C and C respectively. - -Note the I: if the EXPR is in Unicode, you will get the -number of characters, not the number of bytes. To get the length -of the internal string in bytes, use C, see -L. Note that the internal encoding is variable, and the number -of bytes usually meaningless. To get the number of bytes that the -string would have when encoded as UTF-8, use -C. +omitted, returns the length of C<$_>. If EXPR is undefined, returns +C. + +This function cannot be used on an entire array or hash to find out how +many elements these have. For that, use C and C, respectively. + +Like all Perl character operations, length() normally deals in logical +characters, not physical bytes. For how many bytes a string encoded as +UTF-8 would take up, use C (you'll have +to C first). See L and L. =item link OLDFILE,NEWFILE X @@ -2715,10 +2972,12 @@ X Creates a new filename linked to the old filename. Returns true for success, false otherwise. +Portability issues: L. + =item listen SOCKET,QUEUESIZE X -Does the same thing that the listen system call does. Returns true if +Does the same thing that the listen(2) system call does. Returns true if it succeeded, false otherwise. See the example in L. @@ -2734,6 +2993,10 @@ block, file, or eval. If more than one value is listed, the list must be placed in parentheses. See L for details, including issues with tied arrays and hashes. +The C construct can also be used to localize the deletion +of array/hash elements to the current block. +See L. + =item localtime EXPR X X @@ -2747,28 +3010,28 @@ follows: ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); -All list elements are numeric, and come straight out of the C `struct +All list elements are numeric and come straight out of the C `struct tm'. C<$sec>, C<$min>, and C<$hour> are the seconds, minutes, and hours of the specified time. -C<$mday> is the day of the month, and C<$mon> is the month itself, in -the range C<0..11> with 0 indicating January and 11 indicating December. +C<$mday> is the day of the month and C<$mon> the month in +the range C<0..11>, with 0 indicating January and 11 indicating December. This makes it easy to get a month name from a list: my @abbr = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); print "$abbr[$mon] $mday"; # $mon=9, $mday=18 gives "Oct 18" -C<$year> is the number of years since 1900, not just the last two digits +C<$year> is the number of years since 1900, B just the last two digits of the year. That is, C<$year> is C<123> in year 2023. The proper way -to get a complete 4-digit year is simply: +to get a 4-digit year is simply: $year += 1900; Otherwise you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? -To get the last two digits of the year (e.g., '01' in 2001) do: +To get the last two digits of the year (e.g., "01" in 2001) do: $year = sprintf("%02d", $year % 100); @@ -2786,13 +3049,13 @@ In scalar context, C returns the ctime(3) value: $now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994" -This scalar value is B locale dependent but is a Perl builtin. For GMT +This scalar value is B locale-dependent but is a Perl builtin. For GMT instead of local time use the L builtin. See also the -C module (to convert the second, minutes, hours, ... back to +C module (for converting seconds, minutes, hours, and such back to the integer value returned by time()), and the L module's strftime(3) and mktime(3) functions. -To get somewhat similar but locale dependent date strings, set up your +To get somewhat similar but locale-dependent date strings, set up your locale environment variables appropriately (please see L) and try for example: @@ -2804,25 +3067,28 @@ try for example: Note that the C<%a> and C<%b>, the short forms of the day of the week and the month of the year, may not necessarily be three characters wide. -See L for portability concerns. - -The L and L modules provides a convenient, +The L and L modules provide a convenient, by-name access mechanism to the gmtime() and localtime() functions, respectively. For a comprehensive date and time representation look at the L module on CPAN. +Portability issues: L. + =item lock THING X -This function places an advisory lock on a shared variable, or referenced +This function places an advisory lock on a shared variable or referenced object contained in I until the lock goes out of scope. +The value returned is the scalar itself, if the argument is a scalar, or a +reference, if the argument is a hash or array. + lock() is a "weak keyword" : this means that if you've defined a function by this name (before any calls to it), that function will be called -instead. (However, if you've said C, lock() is always a -keyword.) See L. +instead. If you are not under C this does nothing. +See L. =item log EXPR X X X X X @@ -2830,13 +3096,14 @@ X X X X X =item log Returns the natural logarithm (base I) of EXPR. If EXPR is omitted, -returns log of C<$_>. To get the log of another base, use basic algebra: +returns the log of C<$_>. To get the +log of another base, use basic algebra: The base-N log of a number is equal to the natural log of that number divided by the natural log of N. For example: sub log10 { - my $n = shift; - return log($n)/log(10); + my $n = shift; + return log($n)/log(10); } See also L for the inverse operation. @@ -2854,6 +3121,8 @@ information, please see the documentation for C. If EXPR is omitted, stats C<$_>. +Portability issues: L. + =item m// The match operator. See L. @@ -2870,9 +3139,27 @@ total number of elements so generated. Evaluates BLOCK or EXPR in list context, so each element of LIST may produce zero, one, or more elements in the returned value. - @chars = map(chr, @nums); + @chars = map(chr, @numbers); + +translates a list of numbers to the corresponding characters. + + my @squares = map { $_ * $_ } @numbers; + +translates a list of numbers to their squared values. -translates a list of numbers to the corresponding characters. And + my @squares = map { $_ > 5 ? ($_ * $_) : () } @numbers; + +shows that number of returned elements can differ from the number of +input elements. To omit an element, return an empty list (). +This could also be achieved by writing + + my @squares = map { $_ * $_ } grep { $_ > 5 } @numbers; + +which makes the intention more clear. + +Map always returns a list, which can be +assigned to a hash such that the elements +become key/value pairs. See L for more details. %hash = map { get_a_key_for($_) => $_ } @array; @@ -2880,7 +3167,7 @@ is just a funny way to write %hash = (); foreach (@array) { - $hash{get_a_key_for($_)} = $_; + $hash{get_a_key_for($_)} = $_; } Note that C<$_> is an alias to the list value, so it can be used to @@ -2896,27 +3183,27 @@ the list elements, C<$_> keeps being lexical inside the block; that is, it can't be seen from the outside, avoiding any potential side-effects. C<{> starts both hash references and blocks, so C could be either -the start of map BLOCK LIST or map EXPR, LIST. Because perl doesn't look -ahead for the closing C<}> it has to take a guess at which its dealing with -based what it finds just after the C<{>. Usually it gets it right, but if it +the start of map BLOCK LIST or map EXPR, LIST. Because Perl doesn't look +ahead for the closing C<}> it has to take a guess at which it's dealing with +based on what it finds just after the C<{>. Usually it gets it right, but if it doesn't it won't realize something is wrong until it gets to the C<}> and encounters the missing (or unexpected) comma. The syntax error will be -reported close to the C<}> but you'll need to change something near the C<{> -such as using a unary C<+> to give perl some help: +reported close to the C<}>, but you'll need to change something near the C<{> +such as using a unary C<+> to give Perl some help: - %hash = map { "\L$_", 1 } @array # perl guesses EXPR. wrong - %hash = map { +"\L$_", 1 } @array # perl guesses BLOCK. right - %hash = map { ("\L$_", 1) } @array # this also works - %hash = map { lc($_), 1 } @array # as does this. - %hash = map +( lc($_), 1 ), @array # this is EXPR and works! + %hash = map { "\L$_" => 1 } @array # perl guesses EXPR. wrong + %hash = map { +"\L$_" => 1 } @array # perl guesses BLOCK. right + %hash = map { ("\L$_" => 1) } @array # this also works + %hash = map { lc($_) => 1 } @array # as does this. + %hash = map +( lc($_) => 1 ), @array # this is EXPR and works! - %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array) + %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array) or to force an anon hash constructor use C<+{>: - @hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at end + @hashes = map +{ lc($_) => 1 }, @array # EXPR, so needs comma at end -and you get list of anonymous hashes each with only 1 entry. +to get a list of anonymous hashes each with only one entry apiece. =item mkdir FILENAME,MASK X X X @@ -2927,12 +3214,12 @@ X X X Creates the directory specified by FILENAME, with permissions specified by MASK (as modified by C). If it succeeds it -returns true, otherwise it returns false and sets C<$!> (errno). -If omitted, MASK defaults to 0777. If omitted, FILENAME defaults -to C<$_>. +returns true; otherwise it returns false and sets C<$!> (errno). +MASK defaults to 0777 if omitted, and FILENAME defaults +to C<$_> if omitted. -In general, it is better to create directories with permissive MASK, -and let the user modify that with their C, than it is to supply +In general, it is better to create directories with a permissive MASK +and let the user modify that with their C than it is to supply a restrictive MASK and give the user no way to be more permissive. The exceptions to this rule are when the file or directory should be kept private (mail files, for instance). The perlfunc(1) entry on @@ -2943,7 +3230,7 @@ number of trailing slashes. Some operating and filesystems do not get this right, so Perl automatically removes all trailing slashes to keep everyone happy. -In order to recursively create a directory structure look at +To recursively create a directory structure, look at the C function of the L module. =item msgctl ID,CMD,ARG @@ -2957,14 +3244,20 @@ first to get the correct constant definitions. If CMD is C, then ARG must be a variable that will hold the returned C structure. Returns like C: the undefined value for error, C<"0 but true"> for zero, or the actual return value otherwise. See also -L, C, and C documentation. +L and the documentation for C and +C. + +Portability issues: L. =item msgget KEY,FLAGS X Calls the System V IPC function msgget(2). Returns the message queue -id, or the undefined value if there is an error. See also -L and C and C documentation. +id, or C on error. See also +L and the documentation for C and +C. + +Portability issues: L. =item msgrcv ID,VAR,SIZE,TYPE,FLAGS X @@ -2974,21 +3267,25 @@ message queue ID into variable VAR with a maximum message size of SIZE. Note that when a message is received, the message type as a native long integer will be the first thing in VAR, followed by the actual message. This packing may be opened with C. -Taints the variable. Returns true if successful, or false if there is -an error. See also L, C, and -C documentation. +Taints the variable. Returns true if successful, false +on error. See also L and the documentation for +C and C. + +Portability issues: L. =item msgsnd ID,MSG,FLAGS X Calls the System V IPC function msgsnd to send the message MSG to the message queue ID. MSG must begin with the native long integer message -type, and be followed by the length of the actual message, and finally +type, be followed by the length of the actual message, and then finally the message itself. This kind of packing can be achieved with C. Returns true if successful, -or false if there is an error. See also C +false on error. See also the C and C documentation. +Portability issues: L. + =item my EXPR X @@ -3003,7 +3300,7 @@ enclosing block, file, or C. If more than one value is listed, the list must be placed in parentheses. The exact semantics and interface of TYPE and ATTRS are still -evolving. TYPE is currently bound to the use of C pragma, +evolving. TYPE is currently bound to the use of the C pragma, and attributes are handled using the C pragma, or starting from Perl 5.8.0 also via the C module. See L for details, and L, @@ -3018,16 +3315,16 @@ The C command is like the C statement in C; it starts the next iteration of the loop: LINE: while () { - next LINE if /^#/; # discard comments - #... + next LINE if /^#/; # discard comments + #... } Note that if there were a C block on the above, it would get -executed even on discarded lines. If the LABEL is omitted, the command +executed even on discarded lines. If LABEL is omitted, the command refers to the innermost enclosing loop. C cannot be used to exit a block which returns a value such as -C, C or C, and should not be used to exit +C, C, or C, and should not be used to exit a grep() or map() operation. Note that a block by itself is semantically identical to a loop @@ -3036,14 +3333,15 @@ that executes once. Thus C will exit such a block early. See also L for an illustration of how C, C, and C work. -=item no Module VERSION LIST -X +=item no MODULE VERSION LIST +X +X -=item no Module VERSION +=item no MODULE VERSION -=item no Module LIST +=item no MODULE LIST -=item no Module +=item no MODULE =item no VERSION @@ -3058,21 +3356,25 @@ Interprets EXPR as an octal string and returns the corresponding value. (If EXPR happens to start off with C<0x>, interprets it as a hex string. If EXPR starts off with C<0b>, it is interpreted as a binary string. Leading whitespace is ignored in all three cases.) -The following will handle decimal, binary, octal, and hex in the standard -Perl or C notation: +The following will handle decimal, binary, octal, and hex in standard +Perl notation: $val = oct($val) if $val =~ /^0/; If EXPR is omitted, uses C<$_>. To go the other way (produce a number in octal), use sprintf() or printf(): - $perms = (stat("filename"))[2] & 07777; - $oct_perms = sprintf "%lo", $perms; + $dec_perms = (stat("filename"))[2] & 07777; + $oct_perm_str = sprintf "%o", $perms; The oct() function is commonly used when a string such as C<644> needs -to be converted into a file mode, for example. (Although perl will -automatically convert strings into numbers as needed, this automatic -conversion assumes base 10.) +to be converted into a file mode, for example. Although Perl +automatically converts strings into numbers as needed, this automatic +conversion assumes base 10. + +Leading white space is ignored without warning, as too are any trailing +non-digits, such as a decimal point (C only handles non-negative +integers, not negative integers or floating point). =item open FILEHANDLE,EXPR X X X X @@ -3090,91 +3392,91 @@ FILEHANDLE. Simple examples to open a file for reading: - open(my $fh, '<', "input.txt") or die $!; + open(my $fh, "<", "input.txt") + or die "cannot open < input.txt: $!"; and for writing: - open(my $fh, '>', "output.txt") or die $!; + open(my $fh, ">", "output.txt") + or die "cannot open > output.txt: $!"; (The following is a comprehensive reference to open(): for a gentler introduction you may consider L.) -If FILEHANDLE is an undefined scalar variable (or array or hash element) -the variable is assigned a reference to a new anonymous filehandle, -otherwise if FILEHANDLE is an expression, its value is used as the name of -the real filehandle wanted. (This is considered a symbolic reference, so -C should I be in effect.) - -If EXPR is omitted, the scalar variable of the same name as the -FILEHANDLE contains the filename. (Note that lexical variables--those -declared with C--will not work for this purpose; so if you're -using C, specify EXPR in your call to open.) - -If three or more arguments are specified then the mode of opening and -the file name are separate. If MODE is C<< '<' >> or nothing, the file -is opened for input. If MODE is C<< '>' >>, the file is truncated and -opened for output, being created if necessary. If MODE is C<<< '>>' >>>, -the file is opened for appending, again being created if necessary. - -You can put a C<'+'> in front of the C<< '>' >> or C<< '<' >> to +If FILEHANDLE is an undefined scalar variable (or array or hash element), a +new filehandle is autovivified, meaning that the variable is assigned a +reference to a newly allocated anonymous filehandle. Otherwise if +FILEHANDLE is an expression, its value is the real filehandle. (This is +considered a symbolic reference, so C should I be +in effect.) + +If EXPR is omitted, the global (package) scalar variable of the same +name as the FILEHANDLE contains the filename. (Note that lexical +variables--those declared with C or C--will not work for this +purpose; so if you're using C or C, specify EXPR in your +call to open.) + +If three (or more) arguments are specified, the open mode (including +optional encoding) in the second argument are distinct from the filename in +the third. If MODE is C<< < >> or nothing, the file is opened for input. +If MODE is C<< > >>, the file is opened for output, with existing files +first being truncated ("clobbered") and nonexisting files newly created. +If MODE is C<<< >> >>>, the file is opened for appending, again being +created if necessary. + +You can put a C<+> in front of the C<< > >> or C<< < >> to indicate that you want both read and write access to the file; thus -C<< '+<' >> is almost always preferred for read/write updates--the C<< -'+>' >> mode would clobber the file first. You can't usually use +C<< +< >> is almost always preferred for read/write updates--the +C<< +> >> mode would clobber the file first. You cant usually use either read-write mode for updating textfiles, since they have -variable length records. See the B<-i> switch in L for a +variable-length records. See the B<-i> switch in L for a better approach. The file is created with permissions of C<0666> -modified by the process' C value. - -These various prefixes correspond to the fopen(3) modes of C<'r'>, -C<'r+'>, C<'w'>, C<'w+'>, C<'a'>, and C<'a+'>. - -In the 2-arguments (and 1-argument) form of the call the mode and -filename should be concatenated (in this order), possibly separated by -spaces. It is possible to omit the mode in these forms if the mode is -C<< '<' >>. - -If the filename begins with C<'|'>, the filename is interpreted as a -command to which output is to be piped, and if the filename ends with a -C<'|'>, the filename is interpreted as a command which pipes output to -us. See L -for more examples of this. (You are not allowed to C to a command -that pipes both in I out, but see L, L, -and L -for alternatives.) - -For three or more arguments if MODE is C<'|-'>, the filename is +modified by the process's C value. + +These various prefixes correspond to the fopen(3) modes of C, +C, C, C, C, and C. + +In the one- and two-argument forms of the call, the mode and filename +should be concatenated (in that order), preferably separated by white +space. You can--but shouldn't--omit the mode in these forms when that mode +is C<< < >>. It is always safe to use the two-argument form of C if +the filename argument is a known literal. + +For three or more arguments if MODE is C<|->, the filename is interpreted as a command to which output is to be piped, and if MODE -is C<'-|'>, the filename is interpreted as a command which pipes -output to us. In the 2-arguments (and 1-argument) form one should -replace dash (C<'-'>) with the command. +is C<-|>, the filename is interpreted as a command that pipes +output to us. In the two-argument (and one-argument) form, one should +replace dash (C<->) with the command. See L for more examples of this. (You are not allowed to C to a command that pipes both in I out, but see L, L, and -L for alternatives.) +L for +alternatives.) -In the three-or-more argument form of pipe opens, if LIST is specified +In the form of pipe opens taking three or more arguments, if LIST is specified (extra arguments after the command name) then LIST becomes arguments to the command invoked if the platform supports it. The meaning of C with more than three arguments for non-pipe modes is not yet -specified. Experimental "layers" may give extra LIST arguments +defined, but experimental "layers" may give extra LIST arguments meaning. -In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN -and opening C<< '>-' >> opens STDOUT. +In the two-argument (and one-argument) form, opening C<< <- >> +or C<-> opens STDIN and opening C<< >- >> opens STDOUT. -You may use the three-argument form of open to specify IO "layers" -(sometimes also referred to as "disciplines") to be applied to the handle +You may (and usually should) use the three-argument form of open to specify +I/O layers (sometimes referred to as "disciplines") to apply to the handle that affect how the input and output are processed (see L and -L for more details). For example +L for more details). For example: - open(my $fh, "<:encoding(UTF-8)", "file") + open(my $fh, "<:encoding(UTF-8)", "filename") + || die "can't open UTF-8 encoded filename: $!"; -will open the UTF-8 encoded file containing Unicode characters, +opens the UTF8-encoded file containing Unicode characters; see L. Note that if layers are specified in the -three-arg form then default layers stored in ${^OPEN} (see L; +three-argument form, then default layers stored in ${^OPEN} (see L; usually set by the B pragma or the switch B<-CioD>) are ignored. -Open returns nonzero upon success, the undefined value otherwise. If +Open returns nonzero on success, the undefined value otherwise. If the C involved a pipe, the return value happens to be the pid of the subprocess. @@ -3182,123 +3484,122 @@ If you're running Perl on a system that distinguishes between text files and binary files, then you should check out L for tips for dealing with this. The key distinction between systems that need C and those that don't is their text file formats. Systems -like Unix, Mac OS, and Plan 9, which delimit lines with a single -character, and which encode that character in C as C<"\n">, do not +like Unix, Mac OS, and Plan 9, that end lines with a single +character and encode that character in C as C<"\n"> do not need C. The rest need it. -When opening a file, it's usually a bad idea to continue normal execution -if the request failed, so C is frequently used in connection with +When opening a file, it's seldom a good idea to continue +if the request failed, so C is frequently used with C. Even if C won't do what you want (say, in a CGI script, -where you want to make a nicely formatted error message (but there are -modules that can help with that problem)) you should always check -the return value from opening a file. The infrequent exception is when -working with an unopened filehandle is actually what you want to do. +where you want to format a suitable error message (but there are +modules that can help with that problem)) always check +the return value from opening a file. -As a special case the 3-arg form with a read/write mode and the third +As a special case the three-argument form with a read/write mode and the third argument being C: open(my $tmp, "+>", undef) or die ... -opens a filehandle to an anonymous temporary file. Also using "+<" +opens a filehandle to an anonymous temporary file. Also using C<< +< >> works for symmetry, but you really should consider writing something to the temporary file first. You will need to seek() to do the reading. -Since v5.8.0, perl has built using PerlIO by default. Unless you've -changed this (i.e. Configure -Uuseperlio), you can open file handles to -"in memory" files held in Perl scalars via: +Since v5.8.0, Perl has built using PerlIO by default. Unless you've +changed this (such as building Perl with C), you can +open filehandles directly to Perl scalars via: - open($fh, '>', \$variable) || .. + open($fh, ">", \$variable) || .. -Though if you try to re-open C or C as an "in memory" -file, you have to close it first: +To (re)open C or C as an in-memory file, close it first: close STDOUT; - open STDOUT, '>', \$variable or die "Can't open STDOUT: $!"; + open(STDOUT, ">", \$variable) + or die "Can't open STDOUT: $!"; -Examples: +General examples: $ARTICLE = 100; - open ARTICLE or die "Can't find article $ARTICLE: $!\n"; + open(ARTICLE) or die "Can't find article $ARTICLE: $!\n"; while (
) {... - open(LOG, '>>/usr/spool/news/twitlog'); # (log is reserved) + open(LOG, ">>/usr/spool/news/twitlog"); # (log is reserved) # if the open fails, output is discarded - open(my $dbase, '+<', 'dbase.mine') # open for update - or die "Can't open 'dbase.mine' for update: $!"; + open(my $dbase, "+<", "dbase.mine") # open for update + or die "Can't open 'dbase.mine' for update: $!"; - open(my $dbase, '+Tmp$$") # $$ is our process id - or die "Can't start sort: $!"; + open(EXTRACT, "|sort >Tmp$$") # $$ is our process id + or die "Can't start sort: $!"; - # in memory files - open(MEMORY,'>', \$var) - or die "Can't open memory file: $!"; - print MEMORY "foo!\n"; # output will end up in $var + # in-memory files + open(MEMORY, ">", \$var) + or die "Can't open memory file: $!"; + print MEMORY "foo!\n"; # output will appear in $var # process argument list of files along with any includes foreach $file (@ARGV) { - process($file, 'fh00'); + process($file, "fh00"); } sub process { - my($filename, $input) = @_; - $input++; # this is a string increment - unless (open($input, $filename)) { - print STDERR "Can't open $filename: $!\n"; - return; - } - - local $_; - while (<$input>) { # note use of indirection - if (/^#include "(.*)"/) { - process($1, $input); - next; - } - #... # whatever - } + my($filename, $input) = @_; + $input++; # this is a string increment + unless (open($input, "<", $filename)) { + print STDERR "Can't open $filename: $!\n"; + return; + } + + local $_; + while (<$input>) { # note use of indirection + if (/^#include "(.*)"/) { + process($1, $input); + next; + } + #... # whatever + } } See L for detailed info on PerlIO. You may also, in the Bourne shell tradition, specify an EXPR beginning -with C<< '>&' >>, in which case the rest of the string is interpreted +with C<< >& >>, in which case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric) to be duped (as C) and opened. You may use C<&> after C<< > >>, C<<< >> >>>, C<< < >>, C<< +> >>, C<<< +>> >>>, and C<< +< >>. The mode you specify should match the mode of the original filehandle. (Duping a filehandle does not take into account any existing contents -of IO buffers.) If you use the 3-arg form then you can pass either a -number, the name of a filehandle or the normal "reference to a glob". +of IO buffers.) If you use the three-argument form, then you can pass either a +number, the name of a filehandle, or the normal "reference to a glob". Here is a script that saves, redirects, and restores C and C using various methods: #!/usr/bin/perl - open my $oldout, ">&STDOUT" or die "Can't dup STDOUT: $!"; - open OLDERR, ">&", \*STDERR or die "Can't dup STDERR: $!"; + open(my $oldout, ">&STDOUT") or die "Can't dup STDOUT: $!"; + open(OLDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!"; - open STDOUT, '>', "foo.out" or die "Can't redirect STDOUT: $!"; - open STDERR, ">&STDOUT" or die "Can't dup STDOUT: $!"; + open(STDOUT, '>', "foo.out") or die "Can't redirect STDOUT: $!"; + open(STDERR, ">&STDOUT") or die "Can't dup STDOUT: $!"; - select STDERR; $| = 1; # make unbuffered - select STDOUT; $| = 1; # make unbuffered + select STDERR; $| = 1; # make unbuffered + select STDOUT; $| = 1; # make unbuffered - print STDOUT "stdout 1\n"; # this works for - print STDERR "stderr 1\n"; # subprocesses too + print STDOUT "stdout 1\n"; # this works for + print STDERR "stderr 1\n"; # subprocesses too - open STDOUT, ">&", $oldout or die "Can't dup \$oldout: $!"; - open STDERR, ">&OLDERR" or die "Can't dup OLDERR: $!"; + open(STDOUT, ">&", $oldout) or die "Can't dup \$oldout: $!"; + open(STDERR, ">&OLDERR") or die "Can't dup OLDERR: $!"; print STDOUT "stdout 2\n"; print STDERR "stderr 2\n"; @@ -3327,49 +3628,78 @@ or Being parsimonious on filehandles is also useful (besides being parsimonious) for example when something is dependent on file descriptors, like for example locking using flock(). If you do just -C<< open(A, '>>&B') >>, the filehandle A will not have the same file -descriptor as B, and therefore flock(A) will not flock(B), and vice -versa. But with C<< open(A, '>>&=B') >> the filehandles will share -the same file descriptor. - -Note that if you are using Perls older than 5.8.0, Perl will be using -the standard C libraries' fdopen() to implement the "=" functionality. -On many UNIX systems fdopen() fails when file descriptors exceed a -certain value, typically 255. For Perls 5.8.0 and later, PerlIO is -most often the default. - -You can see whether Perl has been compiled with PerlIO or not by -running C and looking for C line. If C -is C, you have PerlIO, otherwise you don't. - -If you open a pipe on the command C<'-'>, i.e., either C<'|-'> or C<'-|'> -with 2-arguments (or 1-argument) form of open(), then -there is an implicit fork done, and the return value of open is the pid -of the child within the parent process, and C<0> within the child -process. (Use C to determine whether the open was successful.) -The filehandle behaves normally for the parent, but i/o to that +C<< open(A, ">>&B") >>, the filehandle A will not have the same file +descriptor as B, and therefore flock(A) will not flock(B) nor vice +versa. But with C<< open(A, ">>&=B") >>, the filehandles will share +the same underlying system file descriptor. + +Note that under Perls older than 5.8.0, Perl uses the standard C library's' +fdopen() to implement the C<=> functionality. On many Unix systems, +fdopen() fails when file descriptors exceed a certain value, typically 255. +For Perls 5.8.0 and later, PerlIO is (most often) the default. + +You can see whether your Perl was built with PerlIO by running C +and looking for the C line. If C is C, you +have PerlIO; otherwise you don't. + +If you open a pipe on the command C<-> (that is, specify either C<|-> or C<-|> +with the one- or two-argument forms of C), +an implicit C is done, so C returns twice: in the parent +process it returns the pid +of the child process, and in the child process it returns (a defined) C<0>. +Use C or C to determine whether the open was successful. + +For example, use either + + $child_pid = open(FROM_KID, "-|") // die "can't fork: $!"; + +or + $child_pid = open(TO_KID, "|-") // die "can't fork: $!"; + +followed by + + if ($child_pid) { + # am the parent: + # either write TO_KID or else read FROM_KID + ... + wait $child_pid; + } else { + # am the child; use STDIN/STDOUT normally + ... + exit; + } + +The filehandle behaves normally for the parent, but I/O to that filehandle is piped from/to the STDOUT/STDIN of the child process. -In the child process the filehandle isn't opened--i/o happens from/to -the new STDOUT or STDIN. Typically this is used like the normal +In the child process, the filehandle isn't opened--I/O happens from/to +the new STDOUT/STDIN. Typically this is used like the normal piped open when you want to exercise more control over just how the -pipe command gets executed, such as when you are running setuid, and -don't want to have to scan shell commands for metacharacters. -The following triples are more or less equivalent: +pipe command gets executed, such as when running setuid and +you don't want to have to scan shell commands for metacharacters. + +The following blocks are more or less equivalent: open(FOO, "|tr '[a-z]' '[A-Z]'"); - open(FOO, '|-', "tr '[a-z]' '[A-Z]'"); - open(FOO, '|-') || exec 'tr', '[a-z]', '[A-Z]'; - open(FOO, '|-', "tr", '[a-z]', '[A-Z]'); + open(FOO, "|-", "tr '[a-z]' '[A-Z]'"); + open(FOO, "|-") || exec 'tr', '[a-z]', '[A-Z]'; + open(FOO, "|-", "tr", '[a-z]', '[A-Z]'); open(FOO, "cat -n '$file'|"); - open(FOO, '-|', "cat -n '$file'"); - open(FOO, '-|') || exec 'cat', '-n', $file; - open(FOO, '-|', "cat", '-n', $file); + open(FOO, "-|", "cat -n '$file'"); + open(FOO, "-|") || exec "cat", "-n", $file; + open(FOO, "-|", "cat", "-n", $file); -The last example in each block shows the pipe as "list form", which is +The last two examples in each block show the pipe as "list form", which is not yet supported on all platforms. A good rule of thumb is that if -your platform has true C (in other words, if your platform is -UNIX) you can use the list form. +your platform has a real C (in other words, if your platform is +Unix, including Linux and MacOS X), you can use the list form. You would +want to use the list form of the pipe so you can pass literal arguments +to the command without risk of the shell interpreting any shell metacharacters +in them. However, this also bars you from opening pipes to commands +that intentionally contain shell metacharacters, such as: + + open(FOO, "|cat -n | expand -4 | lpr") + // die "Can't open pipeline to lpr: $!"; See L for more examples of this. @@ -3381,14 +3711,14 @@ of C on any open handles. On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptor as determined by the value -of $^F. See L. +of C<$^F>. See L. Closing any piped filehandle causes the parent process to wait for the -child to finish, and returns the status value in C<$?> and +child to finish, then returns the status value in C<$?> and C<${^CHILD_ERROR_NATIVE}>. -The filename passed to 2-argument (or 1-argument) form of open() will -have leading and trailing whitespace deleted, and the normal +The filename passed to the one- and two-argument forms of open() will +have leading and trailing whitespace deleted and normal redirection characters honored. This property, known as "magic open", can often be used to good effect. A user could specify a filename of F<"rsh cat file |">, or you could change certain filenames as needed: @@ -3396,37 +3726,40 @@ F<"rsh cat file |">, or you could change certain filenames as needed: $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/; open(FH, $filename) or die "Can't open $filename: $!"; -Use 3-argument form to open a file with arbitrary weird characters in it, +Use the three-argument form to open a file with arbitrary weird characters in it, - open(FOO, '<', $file); + open(FOO, "<", $file) + || die "can't open < $file: $!"; otherwise it's necessary to protect any leading and trailing whitespace: $file =~ s#^(\s)#./$1#; - open(FOO, "< $file\0"); + open(FOO, "< $file\0") + || die "open failed: $!"; (this may not work on some bizarre filesystems). One should -conscientiously choose between the I and 3-arguments form +conscientiously choose between the I and I form of open(): - open IN, $ARGV[0]; + open(IN, $ARGV[0]) || die "can't open $ARGV[0]: $!"; will allow the user to specify an argument of the form C<"rsh cat file |">, -but will not work on a filename which happens to have a trailing space, while +but will not work on a filename that happens to have a trailing space, while - open IN, '<', $ARGV[0]; + open(IN, "<", $ARGV[0]) + || die "can't open < $ARGV[0]: $!"; will have exactly the opposite restrictions. -If you want a "real" C C (see C on your system), then you -should use the C function, which involves no such magic (but -may use subtly different filemodes than Perl open(), which is mapped -to C fopen()). This is -another way to protect your filenames from interpretation. For example: +If you want a "real" C C (see L on your system), then you +should use the C function, which involves no such magic (but may +use subtly different filemodes than Perl open(), which is mapped to C +fopen()). This is another way to protect your filenames from +interpretation. For example: use IO::Handle; sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL) - or die "sysopen $path: $!"; + or die "sysopen $path: $!"; $oldfh = select(HANDLE); $| = 1; select($oldfh); print HANDLE "stuff $$\n"; seek(HANDLE, 0, 0); @@ -3434,24 +3767,36 @@ another way to protect your filenames from interpretation. For example: Using the constructor from the C package (or one of its subclasses, such as C or C), you can generate anonymous -filehandles that have the scope of whatever variables hold references to -them, and automatically close whenever and however you leave that scope: +filehandles that have the scope of the variables used to hold them, then +automatically (but silently) close once their reference counts become +zero, typically at scope exit: use IO::File; #... sub read_myfile_munged { - my $ALL = shift; - my $handle = IO::File->new; - open($handle, "myfile") or die "myfile: $!"; - $first = <$handle> - or return (); # Automatically closed here. - mung $first or die "mung failed"; # Or here. - return $first, <$handle> if $ALL; # Or here. - $first; # Or here. + my $ALL = shift; + # or just leave it undef to autoviv + my $handle = IO::File->new; + open($handle, "<", "myfile") or die "myfile: $!"; + $first = <$handle> + or return (); # Automatically closed here. + mung($first) or die "mung failed"; # Or here. + return (first, <$handle>) if $ALL; # Or here. + return $first; # Or here. } +B The previous example has a bug because the automatic +close that happens when the refcount on C does not +properly detect and report failures. I close the handle +yourself and inspect the return value. + + close($handle) + || warn "close failed: $!"; + See L for some details about mixing reading and writing. +Portability issues: L. + =item opendir DIRHANDLE,EXPR X @@ -3460,10 +3805,10 @@ C, C, and C. Returns true if successful. DIRHANDLE may be an expression whose value can be used as an indirect dirhandle, usually the real dirhandle name. If DIRHANDLE is an undefined scalar variable (or array or hash element), the variable is assigned a -reference to a new anonymous dirhandle. +reference to a new anonymous dirhandle; that is, it's autovivified. DIRHANDLEs have their own namespace separate from FILEHANDLEs. -See example at C. +See the example at C. =item ord EXPR X X @@ -3471,8 +3816,9 @@ X X =item ord Returns the numeric (the native 8-bit encoding, like ASCII or EBCDIC, -or Unicode) value of the first character of EXPR. If EXPR is omitted, -uses C<$_>. +or Unicode) value of the first character of EXPR. +If EXPR is an empty string, returns 0. If EXPR is omitted, uses C<$_>. +(Note I, not byte.) For the reverse, see L. See L for more about Unicode. @@ -3490,14 +3836,14 @@ C associates a simple name with a package variable in the current package for use within the current scope. When C is in effect, C lets you use declared global variables without qualifying them with package names, within the lexical scope of the C declaration. -In this way C differs from C, which is package scoped. +In this way C differs from C, which is package-scoped. -Unlike C, which both allocates storage for a variable and associates -a simple name with that storage for use within the current scope, C -associates a simple name with a package variable in the current package, -for use within the current scope. In other words, C has the same -scoping rules as C, but does not necessarily create a -variable. +Unlike C or C, which allocates storage for a variable and +associates a simple name with that storage for use within the current +scope, C associates a simple name with a package (read: global) +variable in the current package, for use within the current lexical scope. +In other words, C has the same scoping rules as C or C, but +does not necessarily create a variable. If more than one value is listed, the list must be placed in parentheses. @@ -3512,11 +3858,11 @@ of the declaration, not at the point of use. This means the following behavior holds: package Foo; - our $bar; # declares $Foo::bar for rest of lexical scope + our $bar; # declares $Foo::bar for rest of lexical scope $bar = 20; package Bar; - print $bar; # prints 20, as it refers to $Foo::bar + print $bar; # prints 20, as it refers to $Foo::bar Multiple C declarations with the same name in the same lexical scope are allowed if they are in different packages. If they happen @@ -3528,15 +3874,15 @@ merely redundant. use warnings; package Foo; - our $bar; # declares $Foo::bar for rest of lexical scope + our $bar; # declares $Foo::bar for rest of lexical scope $bar = 20; package Bar; - our $bar = 30; # declares $Bar::bar for rest of lexical scope - print $bar; # prints 30 + our $bar = 30; # declares $Bar::bar for rest of lexical scope + print $bar; # prints 30 - our $bar; # emits warning but has no other effect - print $bar; # still prints 30 + our $bar; # emits warning but has no other effect + print $bar; # still prints 30 An C declaration may also have a list of attributes associated with it. @@ -3555,81 +3901,82 @@ Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines -an integer may be represented by a sequence of 4 bytes that will be -converted to a sequence of 4 characters. +an integer may be represented by a sequence of 4 bytes, which will in +Perl be presented as a string that's 4 characters long. + +See L for an introduction to this function. The TEMPLATE is a sequence of characters that give the order and type of values, as follows: - a A string with arbitrary binary data, will be null padded. - A A text (ASCII) string, will be space padded. - Z A null terminated (ASCIZ) string, will be null padded. + a A string with arbitrary binary data, will be null padded. + A A text (ASCII) string, will be space padded. + Z A null-terminated (ASCIZ) string, will be null padded. - b A bit string (ascending bit order inside each byte, like vec()). - B A bit string (descending bit order inside each byte). - h A hex string (low nybble first). - H A hex string (high nybble first). + b A bit string (ascending bit order inside each byte, like vec()). + B A bit string (descending bit order inside each byte). + h A hex string (low nybble first). + H A hex string (high nybble first). - c A signed char (8-bit) value. - C An unsigned char (octet) value. - W An unsigned char value (can be greater than 255). + c A signed char (8-bit) value. + C An unsigned char (octet) value. + W An unsigned char value (can be greater than 255). - s A signed short (16-bit) value. - S An unsigned short value. + s A signed short (16-bit) value. + S An unsigned short value. - l A signed long (32-bit) value. - L An unsigned long value. + l A signed long (32-bit) value. + L An unsigned long value. - q A signed quad (64-bit) value. - Q An unsigned quad value. - (Quads are available only if your system supports 64-bit - integer values _and_ if Perl has been compiled to support those. - Causes a fatal error otherwise.) + q A signed quad (64-bit) value. + Q An unsigned quad value. + (Quads are available only if your system supports 64-bit + integer values _and_ if Perl has been compiled to support those. + Raises an exception otherwise.) - i A signed integer value. - I A unsigned integer value. - (This 'integer' is _at_least_ 32 bits wide. Its exact + i A signed integer value. + I A unsigned integer value. + (This 'integer' is _at_least_ 32 bits wide. Its exact size depends on what a local C compiler calls 'int'.) - n An unsigned short (16-bit) in "network" (big-endian) order. - N An unsigned long (32-bit) in "network" (big-endian) order. - v An unsigned short (16-bit) in "VAX" (little-endian) order. - V An unsigned long (32-bit) in "VAX" (little-endian) order. + n An unsigned short (16-bit) in "network" (big-endian) order. + N An unsigned long (32-bit) in "network" (big-endian) order. + v An unsigned short (16-bit) in "VAX" (little-endian) order. + V An unsigned long (32-bit) in "VAX" (little-endian) order. j A Perl internal signed integer value (IV). J A Perl internal unsigned integer value (UV). - f A single-precision float in the native format. - d A double-precision float in the native format. + f A single-precision float in native format. + d A double-precision float in native format. - F A Perl internal floating point value (NV) in the native format - D A long double-precision float in the native format. - (Long doubles are available only if your system supports long - double values _and_ if Perl has been compiled to support those. - Causes a fatal error otherwise.) + F A Perl internal floating-point value (NV) in native format + D A float of long-double precision in native format. + (Long doubles are available only if your system supports long + double values _and_ if Perl has been compiled to support those. + Raises an exception otherwise.) - p A pointer to a null-terminated string. - P A pointer to a structure (fixed-length string). + p A pointer to a null-terminated string. + P A pointer to a structure (fixed-length string). - u A uuencoded string. - U A Unicode character number. Encodes to a character in character mode + u A uuencoded string. + U A Unicode character number. Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode. - w A BER compressed integer (not an ASN.1 BER, see perlpacktut for - details). Its bytes represent an unsigned integer in base 128, - most significant digit first, with as few digits as possible. Bit - eight (the high bit) is set on each byte except the last. + w A BER compressed integer (not an ASN.1 BER, see perlpacktut for + details). Its bytes represent an unsigned integer in base 128, + most significant digit first, with as few digits as possible. Bit + eight (the high bit) is set on each byte except the last. - x A null byte. - X Back up a byte. - @ Null fill or truncate to absolute position, counted from the - start of the innermost ()-group. - . Null fill or truncate to absolute position specified by value. - ( Start of a ()-group. + x A null byte (a.k.a ASCII NUL, "\000", chr(0)) + X Back up a byte. + @ Null-fill or truncate to absolute position, counted from the + start of the innermost ()-group. + . Null-fill or truncate to absolute position specified by the value. + ( Start of a ()-group. -One or more of the modifiers below may optionally follow some letters in the -TEMPLATE (the second column lists the letters for which the modifier is -valid): +One or more modifiers below may optionally follow certain letters in the +TEMPLATE (the second column lists letters for which the modifier is valid): ! sSlLiI Forces native (short, long, int) sizes instead of fixed (16-/32-bit) sizes. @@ -3648,48 +3995,78 @@ valid): < sSiIlLqQ Force little-endian byte-order on the type. jJfFdDpP (The "little end" touches the construct.) -The C> and C> modifiers can also be used on C<()>-groups, -in which case they force a certain byte-order on all components of -that group, including subgroups. +The C<< > >> and C<< < >> modifiers can also be used on C<()> groups +to force a particular byte-order on all components in that group, +including all its subgroups. The following rules apply: -=over 8 +=over =item * -Each letter may optionally be followed by a number giving a repeat -count. With all types except C, C, C, C, C, C, -C, C<@>, C<.>, C, C and C

, where it means +something else, dscribed below. Supplying a C<*> for the repeat count +instead of a number means to use however many items are left, except for: + +=over + +=item * + +C<@>, C, and C, where it is equivalent to C<0>. + +=item * + +<.>, where it means relative to the start of the string. + +=item * + +C, where it is equivalent to 1 (or 45, which here is equivalent). + +=back + +One can replace a numeric repeat count with a template letter enclosed in +brackets to use the packed byte length of the bracketed template for the +repeat count. + +For example, the template C skips as many bytes as in a packed long, +and the template C<"$t X[$t] $t"> unpacks twice whatever $t (when +variable-expanded) unpacks. If the template in brackets contains alignment +commands (such as C), its packed length is calculated as if the +start of the template had the maximal possible alignment. + +When used with C, a C<*> as the repeat count is guaranteed to add a +trailing null byte, so the resulting string is always one byte longer than +the byte length of the item itself. When used with C<@>, the repeat count represents an offset from the start -of the innermost () group. +of the innermost C<()> group. + +When used with C<.>, the repeat count determines the starting position to +calculate the value offset as follows: + +=over + +=item * -When used with C<.>, the repeat count is used to determine the starting -position from where the value offset is calculated. If the repeat count -is 0, it's relative to the current position. If the repeat count is C<*>, -the offset is relative to the start of the packed string. And if its an -integer C the offset is relative to the start of the n-th innermost -() group (or the start of the string if C is bigger then the group -level). +If the repeat count is C<0>, it's relative to the current position. + +=item * + +If the repeat count is C<*>, the offset is relative to the start of the +packed string. + +=item * + +And if it's an integer I, the offset is relative to the start of the +Ith innermost C<( )> group, or to the start of the string if I is +bigger then the group level. + +=back The repeat count for C is interpreted as the maximal number of bytes to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat @@ -3698,139 +4075,156 @@ count should not be more than 65. =item * The C, C, and C types gobble just one value, but pack it as a -string of length count, padding with nulls or spaces as necessary. When +string of length count, padding with nulls or spaces as needed. When unpacking, C strips trailing whitespace and nulls, C strips everything -after the first null, and C returns data verbatim. +after the first null, and C returns data with no stripping at all. -If the value-to-pack is too long, it is truncated. If too long and an -explicit count is provided, C packs only C<$count-1> bytes, followed -by a null byte. Thus C always packs a trailing null (except when the -count is 0). +If the value to pack is too long, the result is truncated. If it's too +long and an explicit count is provided, C packs only C<$count-1> bytes, +followed by a null byte. Thus C always packs a trailing null, except +when the count is 0. =item * -Likewise, the C and C fields pack a string that many bits long. -Each character of the input field of pack() generates 1 bit of the result. +Likewise, the C and C formats pack a string that's that many bits long. +Each such format generates 1 bit of the result. These are typically followed +by a repeat count like C or C. + Each result bit is based on the least-significant bit of the corresponding input character, i.e., on C. In particular, characters C<"0"> -and C<"1"> generate bits 0 and 1, as do characters C<"\0"> and C<"\1">. +and C<"1"> generate bits 0 and 1, as do characters C<"\000"> and C<"\001">. -Starting from the beginning of the input string of pack(), each 8-tuple -of characters is converted to 1 character of output. With format C +Starting from the beginning of the input string, each 8-tuple +of characters is converted to 1 character of output. With format C, the first character of the 8-tuple determines the least-significant bit of a -character, and with format C it determines the most-significant bit of +character; with format C, it determines the most-significant bit of a character. -If the length of the input string is not exactly divisible by 8, the +If the length of the input string is not evenly divisible by 8, the remainder is packed as if the input string were padded by null characters -at the end. Similarly, during unpack()ing the "extra" bits are ignored. +at the end. Similarly during unpacking, "extra" bits are ignored. -If the input string of pack() is longer than needed, extra characters are -ignored. A C<*> for the repeat count of pack() means to use all the -characters of the input field. On unpack()ing the bits are converted to a -string of C<"0">s and C<"1">s. +If the input string is longer than needed, remaining characters are ignored. + +A C<*> for the repeat count uses all characters of the input field. +On unpacking, bits are converted to a string of C<0>s and C<1>s. =item * -The C and C fields pack a string that many nybbles (4-bit groups, -representable as hexadecimal digits, 0-9a-f) long. +The C and C formats pack a string that many nybbles (4-bit groups, +representable as hexadecimal digits, C<"0".."9"> C<"a".."f">) long. -Each character of the input field of pack() generates 4 bits of the result. -For non-alphabetical characters the result is based on the 4 least-significant +For each such format, pack() generates 4 bits of result. +With non-alphabetical characters, the result is based on the 4 least-significant bits of the input character, i.e., on C. In particular, characters C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes -C<"\0"> and C<"\1">. For characters C<"a".."f"> and C<"A".."F"> the result +C<"\000"> and C<"\001">. For characters C<"a".."f"> and C<"A".."F">, the result is compatible with the usual hexadecimal digits, so that C<"a"> and -C<"A"> both generate the nybble C<0xa==10>. The result for characters -C<"g".."z"> and C<"G".."Z"> is not well-defined. +C<"A"> both generate the nybble C<0xA==10>. Use only these specific hex +characters with this format. -Starting from the beginning of the input string of pack(), each pair -of characters is converted to 1 character of output. With format C the +Starting from the beginning of the template to pack(), each pair +of characters is converted to 1 character of output. With format C, the first character of the pair determines the least-significant nybble of the -output character, and with format C it determines the most-significant +output character; with format C, it determines the most-significant nybble. -If the length of the input string is not even, it behaves as if padded -by a null character at the end. Similarly, during unpack()ing the "extra" -nybbles are ignored. +If the length of the input string is not even, it behaves as if padded by +a null character at the end. Similarly, "extra" nybbles are ignored during +unpacking. + +If the input string is longer than needed, extra characters are ignored. -If the input string of pack() is longer than needed, extra characters are -ignored. -A C<*> for the repeat count of pack() means to use all the characters of -the input field. On unpack()ing the nybbles are converted to a string -of hexadecimal digits. +A C<*> for the repeat count uses all characters of the input field. For +unpack(), nybbles are converted to a string of hexadecimal digits. =item * -The C

type packs a pointer to a null-terminated string. You are -responsible for ensuring the string is not a temporary value (which can -potentially get deallocated before you get around to using the packed result). -The C

type packs a pointer to a structure of the size indicated by the -length. A NULL pointer is created if the corresponding value for C

or -C

is C, similarly for unpack(). +The C

format packs a pointer to a null-terminated string. You are +responsible for ensuring that the string is not a temporary value, as that +could potentially get deallocated before you got around to using the packed +result. The C

format packs a pointer to a structure of the size indicated +by the length. A null pointer is created if the corresponding value for +C

or C

is C; similarly with unpack(), where a null pointer +unpacks into C. -If your system has a strange pointer size (i.e. a pointer is neither as -big as an int nor as big as a long), it may not be possible to pack or +If your system has a strange pointer size--meaning a pointer is neither as +big as an int nor as big as a long--it may not be possible to pack or unpack pointers in big- or little-endian byte order. Attempting to do -so will result in a fatal error. +so raises an exception. =item * The C template character allows packing and unpacking of a sequence of -items where the packed structure contains a packed item count followed by -the packed items themselves. - -For C you write ICI and the -I describes how the length value is packed. The ones likely -to be of most use are integer-packing ones like C (for Java strings), -C (for ASN.1 or SNMP) and C (for Sun XDR). - -For C, the I may have a repeat count, in which case -the minimum of that and the number of available items is used as argument -for the I. If it has no repeat count or uses a '*', the number +items where the packed structure contains a packed item count followed by +the packed items themselves. This is useful when the structure you're +unpacking has encoded the sizes or repeat counts for some of its fields +within the structure itself as separate fields. + +For C, you write ICI, and the +I describes how the length value is packed. Formats likely +to be of most use are integer-packing ones like C for Java strings, +C for ASN.1 or SNMP, and C for Sun XDR. + +For C, I may have a repeat count, in which case +the minimum of that and the number of available items is used as the argument +for I. If it has no repeat count or uses a '*', the number of available items is used. -For C an internal stack of integer arguments unpacked so far is +For C, an internal stack of integer arguments unpacked so far is used. You write CI and the repeat count is obtained by popping off the last element from the stack. The I must not have a repeat count. -If the I refers to a string type (C<"A">, C<"a"> or C<"Z">), -the I is a string length, not a number of strings. If there is -an explicit repeat count for pack, the packed string will be adjusted to that -given length. +If I refers to a string type (C<"A">, C<"a">, or C<"Z">), +the I is the string length, not the number of strings. With +an explicit repeat count for pack, the packed string is adjusted to that +length. For example: + + unpack("W/a", "\004Gurusamy") gives ("Guru") + unpack("a3/A A*", "007 Bond J ") gives (" Bond", "J") + unpack("a3 x2 /A A*", "007: Bond, J.") gives ("Bond, J", ".") - unpack 'W/a', "\04Gurusamy"; gives ('Guru') - unpack 'a3/A A*', '007 Bond J '; gives (' Bond', 'J') - unpack 'a3 x2 /A A*', '007: Bond, J.'; gives ('Bond, J', '.') - pack 'n/a* w/a','hello,','world'; gives "\000\006hello,\005world" - pack 'a/W2', ord('a') .. ord('z'); gives '2ab' + pack("n/a* w/a","hello,","world") gives "\000\006hello,\005world" + pack("a/W2", ord("a") .. ord("z")) gives "2ab" The I is not returned explicitly from C. -Adding a count to the I letter is unlikely to do anything -useful, unless that letter is C, C or C. Packing with a -I of C or C may introduce C<"\000"> characters, -which Perl does not regard as legal in numeric strings. +Supplying a count to the I format letter is only useful with +C, C, or C. Packing with a I of C or C may +introduce C<"\000"> characters, which Perl does not regard as legal in +numeric strings. =item * The integer types C, C, C, and C may be -followed by a C modifier to signify native shorts or -longs--as you can see from above for example a bare C does mean -exactly 32 bits, the native C (as seen by the local C compiler) -may be larger. This is an issue mainly in 64-bit platforms. You can -see whether using C makes any difference by +followed by a C modifier to specify native shorts or +longs. As shown in the example above, a bare C means +exactly 32 bits, although the native C as seen by the local C compiler +may be larger. This is mainly an issue on 64-bit platforms. You can +see whether using C makes any difference this way: - print length(pack("s")), " ", length(pack("s!")), "\n"; - print length(pack("l")), " ", length(pack("l!")), "\n"; + printf "format s is %d, s! is %d\n", + length pack("s"), length pack("s!"); -C and C also work but only because of completeness; + printf "format l is %d, l! is %d\n", + length pack("l"), length pack("l!"); + + +C and C are also allowed, but only for completeness' sake: they are identical to C and C. The actual sizes (in bytes) of native shorts, ints, longs, and long -longs on the platform where Perl was built are also available via -L: +longs on the platform where Perl was built are also available from +the command line: + + $ perl -V:{short,int,long{,long}}size + shortsize='2'; + intsize='4'; + longsize='4'; + longlongsize='8'; + +or programmatically via the C module: use Config; print $Config{shortsize}, "\n"; @@ -3838,165 +4232,208 @@ L: print $Config{longsize}, "\n"; print $Config{longlongsize}, "\n"; -(The C<$Config{longlongsize}> will be undefined if your system does -not support long longs.) +C<$Config{longlongsize}> is undefined on systems without +long long support. =item * -The integer formats C, C, C, C, C, C, C, and C -are inherently non-portable between processors and operating systems -because they obey the native byteorder and endianness. For example a -4-byte integer 0x12345678 (305419896 decimal) would be ordered natively -(arranged in and handled by the CPU registers) into bytes as +The integer formats C, C, C, C, C, C, C, and C are +inherently non-portable between processors and operating systems because +they obey native byteorder and endianness. For example, a 4-byte integer +0x12345678 (305419896 decimal) would be ordered natively (arranged in and +handled by the CPU registers) into bytes as - 0x12 0x34 0x56 0x78 # big-endian - 0x78 0x56 0x34 0x12 # little-endian + 0x12 0x34 0x56 0x78 # big-endian + 0x78 0x56 0x34 0x12 # little-endian -Basically, the Intel and VAX CPUs are little-endian, while everybody -else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and -Cray are big-endian. Alpha and MIPS can be either: Digital/Compaq -used/uses them in little-endian mode; SGI/Cray uses them in big-endian -mode. +Basically, Intel and VAX CPUs are little-endian, while everybody else, +including Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray, are +big-endian. Alpha and MIPS can be either: Digital/Compaq uses (well, used) +them in little-endian mode, but SGI/Cray uses them in big-endian mode. -The names `big-endian' and `little-endian' are comic references to -the classic "Gulliver's Travels" (via the paper "On Holy Wars and a -Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and -the egg-eating habits of the Lilliputians. +The names I and I are comic references to the +egg-eating habits of the little-endian Lilliputians and the big-endian +Blefuscudians from the classic Jonathan Swift satire, I. +This entered computer lingo via the paper "On Holy Wars and a Plea for +Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980. Some systems may have even weirder byte orders such as - 0x56 0x78 0x12 0x34 - 0x34 0x12 0x78 0x56 + 0x56 0x78 0x12 0x34 + 0x34 0x12 0x78 0x56 -You can see your system's preference with +You can determine your system endianness with this incantation: - print join(" ", map { sprintf "%#02x", $_ } - unpack("W*",pack("L",0x12345678))), "\n"; + printf("%#02x ", $_) for unpack("W*", pack L=>0x12345678); The byteorder on the platform where Perl was built is also available via L: - use Config; - print $Config{byteorder}, "\n"; + use Config; + print "$Config{byteorder}\n"; + +or from the command line: + + $ perl -V:byteorder -Byteorders C<'1234'> and C<'12345678'> are little-endian, C<'4321'> -and C<'87654321'> are big-endian. +Byteorders C<"1234"> and C<"12345678"> are little-endian; C<"4321"> +and C<"87654321"> are big-endian. -If you want portable packed integers you can either use the formats -C, C, C, and C, or you can use the C> and C> -modifiers. These modifiers are only available as of perl 5.9.2. -See also L. +For portably packed integers, either use the formats C, C, C, +and C or else use the C<< > >> and C<< < >> modifiers described +immediately below. See also L. =item * -All integer and floating point formats as well as C

and C

and -C<()>-groups may be followed by the C> or C> modifiers -to force big- or little- endian byte-order, respectively. -This is especially useful, since C, C, C and C don't cover -signed integers, 64-bit integers and floating point values. However, -there are some things to keep in mind. +Starting with Perl 5.9.2, integer and floating-point formats, along with +the C

and C

formats and C<()> groups, may all be followed by the +C<< > >> or C<< < >> endianness modifiers to respectively enforce big- +or little-endian byte-order. These modifiers are especially useful +given how C, C, C, and C don't cover signed integers, +64-bit integers, or floating-point values. + +Here are some concerns to keep in mind when using an endianness modifier: + +=over + +=item * + +Exchanging signed integers between different platforms works only +when all platforms store them in the same format. Most platforms store +signed integers in two's-complement notation, so usually this is not an issue. -Exchanging signed integers between different platforms only works -if all platforms store them in the same format. Most platforms store -signed integers in two's complement, so usually this is not an issue. +=item * -The C> or C> modifiers can only be used on floating point +The C<< > >> or C<< < >> modifiers can only be used on floating-point formats on big- or little-endian machines. Otherwise, attempting to -do so will result in a fatal error. - -Forcing big- or little-endian byte-order on floating point values for -data exchange can only work if all platforms are using the same -binary representation (e.g. IEEE floating point format). Even if all -platforms are using IEEE, there may be subtle differences. Being able -to use C> or C> on floating point values can be very useful, -but also very dangerous if you don't know exactly what you're doing. -It is definitely not a general way to portably store floating point -values. - -When using C> or C> on an C<()>-group, this will affect -all types inside the group that accept the byte-order modifiers, -including all subgroups. It will silently be ignored for all other +use them raises an exception. + +=item * + +Forcing big- or little-endian byte-order on floating-point values for +data exchange can work only if all platforms use the same +binary representation such as IEEE floating-point. Even if all +platforms are using IEEE, there may still be subtle differences. Being able +to use C<< > >> or C<< < >> on floating-point values can be useful, +but also dangerous if you don't know exactly what you're doing. +It is not a general way to portably store floating-point values. + +=item * + +When using C<< > >> or C<< < >> on a C<()> group, this affects +all types inside the group that accept byte-order modifiers, +including all subgroups. It is silently ignored for all other types. You are not allowed to override the byte-order within a group that already has a byte-order modifier suffix. +=back + =item * -Real numbers (floats and doubles) are in the native machine format only; -due to the multiplicity of floating formats around, and the lack of a -standard "network" representation, no facility for interchange has been -made. This means that packed floating point data written on one machine -may not be readable on another - even if both use IEEE floating point -arithmetic (as the endian-ness of the memory representation is not part +Real numbers (floats and doubles) are in native machine format only. +Due to the multiplicity of floating-point formats and the lack of a +standard "network" representation for them, no facility for interchange has been +made. This means that packed floating-point data written on one machine +may not be readable on another, even if both use IEEE floating-point +arithmetic (because the endianness of the memory representation is not part of the IEEE spec). See also L. -If you know exactly what you're doing, you can use the C> or C> -modifiers to force big- or little-endian byte-order on floating point values. +If you know I what you're doing, you can use the C<< > >> or C<< < >> +modifiers to force big- or little-endian byte-order on floating-point values. -Note that Perl uses doubles (or long doubles, if configured) internally for -all numeric calculation, and converting from double into float and thence back -to double again will lose precision (i.e., C) -will not in general equal $foo). +Because Perl uses doubles (or long doubles, if configured) internally for +all numeric calculation, converting from double into float and thence +to double again loses precision, so C) +will not in general equal $foo. =item * -Pack and unpack can operate in two modes, character mode (C mode) where -the packed string is processed per character and UTF-8 mode (C mode) +Pack and unpack can operate in two modes: character mode (C mode) where +the packed string is processed per character, and UTF-8 mode (C mode) where the packed string is processed in its UTF-8-encoded Unicode form on -a byte by byte basis. Character mode is the default unless the format string -starts with an C. You can switch mode at any moment with an explicit -C or C in the format. A mode is in effect until the next mode switch -or until the end of the ()-group in which it was entered. +a byte-by-byte basis. Character mode is the default unless the format string +starts with C. You can always switch mode mid-format with an explicit +C or C in the format. This mode remains in effect until the next +mode change, or until the end of the C<()> group it (directly) applies to. + +Using C to get Unicode characters while using C to get I-Unicode +bytes is not necessarily obvious. Probably only the first of these +is what you want: + + $ perl -CS -E 'say "\x{3B1}\x{3C9}"' | + perl -CS -ne 'printf "%v04X\n", $_ for unpack("C0A*", $_)' + 03B1.03C9 + $ perl -CS -E 'say "\x{3B1}\x{3C9}"' | + perl -CS -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)' + CE.B1.CF.89 + $ perl -CS -E 'say "\x{3B1}\x{3C9}"' | + perl -C0 -ne 'printf "%v02X\n", $_ for unpack("C0A*", $_)' + CE.B1.CF.89 + $ perl -CS -E 'say "\x{3B1}\x{3C9}"' | + perl -C0 -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)' + C3.8E.C2.B1.C3.8F.C2.89 + +Those examples also illustrate that you should not try to use +C/C as a substitute for the L module. =item * -You must yourself do any alignment or padding by inserting for example -enough C<'x'>es while packing. There is no way to pack() and unpack() -could know where the characters are going to or coming from. Therefore -C (and C) handle their output and input as flat -sequences of characters. +You must yourself do any alignment or padding by inserting, for example, +enough C<"x">es while packing. There is no way for pack() and unpack() +to know where characters are going to or coming from, so they +handle their output and input as flat sequences of characters. =item * -A ()-group is a sub-TEMPLATE enclosed in parentheses. A group may -take a repeat count, both as postfix, and for unpack() also via the C -template character. Within each repetition of a group, positioning with -C<@> starts again at 0. Therefore, the result of +A C<()> group is a sub-TEMPLATE enclosed in parentheses. A group may +take a repeat count either as postfix, or for unpack(), also via the C +template character. Within each repetition of a group, positioning with +C<@> starts over at 0. Therefore, the result of - pack( '@1A((@2A)@3A)', 'a', 'b', 'c' ) + pack("@1A((@2A)@3A)", qw[X Y Z]) -is the string "\0a\0\0bc". +is the string C<"\0X\0\0YZ">. =item * -C and C accept C modifier. In this case they act as -alignment commands: they jump forward/back to the closest position -aligned at a multiple of C characters. For example, to pack() or -unpack() C's C one may need to -use the template C; this assumes that doubles must be -aligned on the double's size. +C and C accept the C modifier to act as alignment commands: they +jump forward or back to the closest position aligned at a multiple of C +characters. For example, to pack() or unpack() a C structure like + + struct { + char c; /* one signed, 8-bit character */ + double d; + char cc[2]; + } + +one may need to use the template C. This assumes that +doubles must be aligned to the size of double. -For alignment commands C of 0 is equivalent to C of 1; -both result in no-ops. +For alignment commands, a C of 0 is equivalent to a C of 1; +both are no-ops. =item * -C, C, C and C accept the C modifier. In this case they -will represent signed 16-/32-bit integers in big-/little-endian order. -This is only portable if all platforms sharing the packed data use the -same binary representation for signed integers (e.g. all platforms are -using two's complement representation). +C, C, C and C accept the C modifier to +represent signed 16-/32-bit integers in big-/little-endian order. +This is portable only when all platforms sharing packed data use the +same binary representation for signed integers; for example, when all +platforms use two's-complement representation. =item * -A comment in a TEMPLATE starts with C<#> and goes to the end of line. -White space may be used to separate pack codes from each other, but -modifiers and a repeat count must follow immediately. +Comments can be embedded in a TEMPLATE using C<#> through the end of line. +White space can separate pack codes from each other, but modifiers and +repeat counts must follow immediately. Breaking complex templates into +individual line-by-line components, suitably annotated, can do as much to +improve legibility and maintainability of pack/unpack formats as C can +for complicated pattern matches. =item * -If TEMPLATE requires more arguments to pack() than actually given, pack() +If TEMPLATE requires more arguments than pack() is given, pack() assumes additional C<""> arguments. If TEMPLATE requires fewer arguments -to pack() than actually given, extra arguments are ignored. +than given, extra arguments are ignored. =back @@ -4019,14 +4456,14 @@ Examples: $foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD" - # note: the above examples featuring "W" and "c" are true + # NOTE: The examples above featuring "W" and "c" are true # only on ASCII and ASCII-derived systems such as ISO Latin 1 - # and UTF-8. In EBCDIC the first example would be - # $foo = pack("WWWW",193,194,195,196); + # and UTF-8. On EBCDIC systems, the first example would be + # $foo = pack("WWWW",193,194,195,196); $foo = pack("s2",1,2); - # "\1\0\2\0" on little-endian - # "\0\1\0\2" on big-endian + # "\001\000\002\000" on little-endian + # "\000\001\000\002" on big-endian $foo = pack("a4","abcd","x","y","z"); # "abcd" @@ -4048,7 +4485,7 @@ Examples: # "@utmp1" eq "@utmp2" sub bintodec { - unpack("N", pack("B32", substr("0" x 32 . shift, -32))); + unpack("N", pack("B32", substr("0" x 32 . shift, -32))); } $foo = pack('sx2l', 12, 34); @@ -4071,25 +4508,45 @@ Examples: The same template may generally also be used in unpack(). =item package NAMESPACE -X X X - -=item package - -Declares the compilation unit as being in the given namespace. The scope -of the package declaration is from the declaration itself through the end -of the enclosing block, file, or eval (the same as the C operator). -All further unqualified dynamic identifiers will be in this namespace. -A package statement affects only dynamic variables--including those -you've used C on--but I lexical variables, which are created -with C. Typically it would be the first declaration in a file to -be included by the C or C operator. You can switch into a -package in more than one place; it merely influences which symbol table -is used by the compiler for the rest of that block. You can refer to -variables and filehandles in other packages by prefixing the identifier -with the package name and a double colon: C<$Package::Variable>. -If the package name is null, the C

package as assumed. That is, -C<$::sail> is equivalent to C<$main::sail> (as well as to C<$main'sail>, -still seen in older code). + +=item package NAMESPACE VERSION +X X X X + +=item package NAMESPACE BLOCK + +=item package NAMESPACE VERSION BLOCK +X X X X + +Declares the BLOCK or the rest of the compilation unit as being in the +given namespace. The scope of the package declaration is either the +supplied code BLOCK or, in the absence of a BLOCK, from the declaration +itself through the end of current scope (the enclosing block, file, or +C). That is, the forms without a BLOCK are operative through the end +of the current scope, just like the C, C, and C operators. +All unqualified dynamic identifiers in this scope will be in the given +namespace, except where overridden by another C declaration or +when they're one of the special identifiers that qualify into C, +like C, C, C, and the punctuation variables. + +A package statement affects dynamic variables only, including those +you've used C on, but I lexical variables, which are created +with C, C, or C. Typically it would be the first +declaration in a file included by C or C. You can switch into a +package in more than one place, since this only determines which default +symbol table the compiler uses for the rest of that block. You can refer to +identifiers in other packages than the current one by prefixing the identifier +with the package name and a double colon, as in C<$SomePack::var> +or C. If package name is omitted, the C
+package as assumed. That is, C<$::sail> is equivalent to +C<$main::sail> (as well as to C<$main'sail>, still seen in ancient +code, mostly from Perl 4). + +If VERSION is provided, C sets the C<$VERSION> variable in the given +namespace to a L object with the VERSION provided. VERSION must be a +"strict" style version number as defined by the L module: a positive +decimal number (integer or decimal-fraction) without exponentiation or else a +dotted-decimal v-string with a leading 'v' character and at least three +components. You should set C<$VERSION> only once per package. See L for more information about packages, modules, and classes. See L for other scoping issues. @@ -4103,89 +4560,119 @@ unless you are very careful. In addition, note that Perl's pipes use IO buffering, so you may need to set C<$|> to flush your WRITEHANDLE after each command, depending on the application. -See L, L, and L +See L, L, and +L for examples of such things. -On systems that support a close-on-exec flag on files, the flag will be set -for the newly opened file descriptors as determined by the value of $^F. -See L. +On systems that support a close-on-exec flag on files, that flag is set +on all newly opened file descriptors whose Cs are I than +the current value of $^F (by default 2 for C). See L. =item pop ARRAY X X +=item pop EXPR + =item pop Pops and returns the last value of the array, shortening the array by one element. -If there are no elements in the array, returns the undefined value -(although this may happen at other times as well). If ARRAY is -omitted, pops the C<@ARGV> array in the main program, and the C<@_> -array in subroutines, just like C. +Returns the undefined value if the array is empty, although this may also +happen at other times. If ARRAY is omitted, pops the C<@ARGV> array in the +main program, but the C<@_> array in subroutines, just like C. + +Starting with Perl 5.14, C can take a scalar EXPR, which must hold a +reference to an unblessed array. The argument will be dereferenced +automatically. This aspect of C is considered highly experimental. +The exact behaviour may change in a future version of Perl. =item pos SCALAR X X =item pos -Returns the offset of where the last C search left off for the variable -in question (C<$_> is used when the variable is not specified). Note that -0 is a valid match offset. C indicates that the search position -is reset (usually due to match failure, but can also be because no match has -yet been performed on the scalar). C directly accesses the location used -by the regexp engine to store the offset, so assigning to C will change -that offset, and so will also influence the C<\G> zero-width assertion in -regular expressions. Because a failed C match doesn't reset the offset, -the return from C won't change either in this case. See L and +Returns the offset of where the last C search left off for the +variable in question (C<$_> is used when the variable is not +specified). Note that 0 is a valid match offset. C indicates +that the search position is reset (usually due to match failure, but +can also be because no match has yet been run on the scalar). + +C directly accesses the location used by the regexp engine to +store the offset, so assigning to C will change that offset, and +so will also influence the C<\G> zero-width assertion in regular +expressions. Both of these effects take place for the next match, so +you can't affect the position with C during the current match, +such as in C<(?{pos() = 5})> or C. + +Setting C also resets the I flag, described +under L. + +Because a failed C match doesn't reset the offset, the return +from C won't change either in this case. See L and L. =item print FILEHANDLE LIST X +=item print FILEHANDLE + =item print LIST =item print Prints a string or a list of strings. Returns true if successful. -FILEHANDLE may be a scalar variable name, in which case the variable -contains the name of or a reference to the filehandle, thus introducing -one level of indirection. (NOTE: If FILEHANDLE is a variable and -the next token is a term, it may be misinterpreted as an operator -unless you interpose a C<+> or put parentheses around the arguments.) -If FILEHANDLE is omitted, prints by default to standard output (or -to the last selected output channel--see L). If LIST is -also omitted, prints C<$_> to the currently selected output channel. -To set the default output channel to something other than STDOUT -use the select operation. The current value of C<$,> (if any) is -printed between each LIST item. The current value of C<$\> (if -any) is printed after the entire LIST has been printed. Because -print takes a LIST, anything in the LIST is evaluated in list -context, and any subroutine that you call will have one or more of -its expressions evaluated in list context. Also be careful not to -follow the print keyword with a left parenthesis unless you want -the corresponding right parenthesis to terminate the arguments to -the print--interpose a C<+> or put parentheses around all the -arguments. - -Note that if you're storing FILEHANDLEs in an array, or if you're using -any other expression more complex than a scalar variable to retrieve it, -you will have to use a block returning the filehandle value instead: +FILEHANDLE may be a scalar variable containing the name of or a reference +to the filehandle, thus introducing one level of indirection. (NOTE: If +FILEHANDLE is a variable and the next token is a term, it may be +misinterpreted as an operator unless you interpose a C<+> or put +parentheses around the arguments.) If FILEHANDLE is omitted, prints to the +last selected (see L) output handle. If LIST is omitted, prints +C<$_> to the currently selected output handle. To use FILEHANDLE alone to +print the content of C<$_> to it, you must use a real filehandle like +C, not an indirect one like C<$fh>. To set the default output handle +to something other than STDOUT, use the select operation. + +The current value of C<$,> (if any) is printed between each LIST item. The +current value of C<$\> (if any) is printed after the entire LIST has been +printed. Because print takes a LIST, anything in the LIST is evaluated in +list context, including any subroutines whose return lists you pass to +C. Be careful not to follow the print keyword with a left +parenthesis unless you want the corresponding right parenthesis to +terminate the arguments to the print; put parentheses around all arguments +(or interpose a C<+>, but that doesn't look as good). + +If you're storing handles in an array or hash, or in general whenever +you're using any expression more complex than a bareword handle or a plain, +unsubscripted scalar variable to retrieve it, you will have to use a block +returning the filehandle value instead, in which case the LIST may not be +omitted: print { $files[$i] } "stuff\n"; print { $OK ? STDOUT : STDERR } "stuff\n"; +Printing to a closed pipe or socket will generate a SIGPIPE signal. See +L for more on signal handling. + =item printf FILEHANDLE FORMAT, LIST X +=item printf FILEHANDLE + =item printf FORMAT, LIST +=item printf + Equivalent to C, except that C<$\> -(the output record separator) is not appended. The first argument -of the list will be interpreted as the C format. See C -for an explanation of the format argument. If C is in effect, -and POSIX::setlocale() has been called, the character used for the decimal -separator in formatted floating point numbers is affected by the LC_NUMERIC -locale. See L and L. +(the output record separator) is not appended. The first argument of the +list will be interpreted as the C format. See +L for an +explanation of the format argument. If you omit the LIST, C<$_> is used; +to use FILEHANDLE without a LIST, you must use a real filehandle like +C, not an indirect one like C<$fh>. If C is in effect and +POSIX::setlocale() has been called, the character used for the decimal +separator in formatted floating-point numbers is affected by the LC_NUMERIC +locale setting. See L and L. Don't fall into the trap of using a C when a simple C would do. The C is more efficient and less @@ -4199,7 +4686,7 @@ function has no prototype). FUNCTION is a reference to, or the name of, the function whose prototype you want to retrieve. If FUNCTION is a string starting with C, the rest is taken as a -name for Perl builtin. If the builtin is not I (such as +name for a Perl builtin. If the builtin is not I (such as C) or if its arguments cannot be adequately expressed by a prototype (such as C), prototype() returns C, because the builtin does not really behave like a Perl function. Otherwise, the string @@ -4208,17 +4695,24 @@ describing the equivalent prototype is returned. =item push ARRAY,LIST X X -Treats ARRAY as a stack, and pushes the values of LIST -onto the end of ARRAY. The length of ARRAY increases by the length of -LIST. Has the same effect as +=item push EXPR,LIST + +Treats ARRAY as a stack by appending the values of LIST to the end of +ARRAY. The length of ARRAY increases by the length of LIST. Has the same +effect as for $value (LIST) { - $ARRAY[++$#ARRAY] = $value; + $ARRAY[++$#ARRAY] = $value; } but is more efficient. Returns the number of elements in the array following the completed C. +Starting with Perl 5.14, C can take a scalar EXPR, which must hold a +reference to an unblessed array. The argument will be dereferenced +automatically. This aspect of C is considered highly experimental. +The exact behaviour may change in a future version of Perl. + =item q/STRING/ =item qq/STRING/ @@ -4247,6 +4741,37 @@ the C<\Q> escape in double-quoted strings. If EXPR is omitted, uses C<$_>. +quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into +regular expressions, because by default an interpolated variable will be +considered a mini-regular expression. For example: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + $sentence =~ s{$substring}{big bad wolf}; + +Will cause C<$sentence> to become C<'The big bad wolf jumped over...'>. + +On the other hand: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + $sentence =~ s{\Q$substring\E}{big bad wolf}; + +Or: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + my $quoted_substring = quotemeta($substring); + $sentence =~ s{$quoted_substring}{big bad wolf}; + +Will both leave the sentence as is. Normally, when accepting literal string +input from the user, quotemeta() or C<\Q> must be used. + +In Perl 5.14, all characters whose code points are above 127 are not +quoted in UTF8-encoded strings, but all are quoted in UTF-8 strings. +It is planned to change this behavior in 5.16, but the exact rules +haven't been determined yet. + =item rand EXPR X X @@ -4255,8 +4780,8 @@ X X Returns a random fractional number greater than or equal to C<0> and less than the value of EXPR. (EXPR should be positive.) If EXPR is omitted, the value C<1> is used. Currently EXPR with the value C<0> is -also special-cased as C<1> - this has not been documented before perl 5.8.0 -and is subject to change in future versions of perl. Automatically calls +also special-cased as C<1> (this was undocumented before Perl 5.8.0 +and is subject to change in future versions of Perl). Automatically calls C unless C has already been called. See also C. Apply C to the value returned by C if you want random @@ -4270,6 +4795,13 @@ returns a random integer between C<0> and C<9>, inclusive. large or too small, then your version of Perl was probably compiled with the wrong number of RANDBITS.) +B is not cryptographically secure. You should not rely +on it in security-sensitive situations.> As of this writing, a +number of third-party CPAN modules offer random number generators +intended by their authors to be cryptographically secure, +including: L, L, and +L. + =item read FILEHANDLE,SCALAR,LENGTH,OFFSET X X @@ -4289,14 +4821,15 @@ the string. A positive OFFSET greater than the length of SCALAR results in the string being padded to the required size with C<"\0"> bytes before the result of the read is appended. -The call is actually implemented in terms of either Perl's or system's -fread() call. To get a true read(2) system call, see C. +The call is implemented in terms of either Perl's or your system's native +fread(3) library function. To get a true read(2) system call, see +L. Note the I: depending on the status of the filehandle, -either (8-bit) bytes or characters are read. By default all +either (8-bit) bytes or characters are read. By default, all filehandles operate on bytes, but for example if the filehandle has been opened with the C<:utf8> I/O layer (see L, and the C -pragma, L), the I/O will operate on UTF-8 encoded Unicode +pragma, L), the I/O will operate on UTF8-encoded Unicode characters, not bytes. Similarly for the C<:encoding> pragma: in that case pretty much any characters can be read. @@ -4305,8 +4838,8 @@ X Returns the next directory entry for a directory opened by C. If used in list context, returns all the rest of the entries in the -directory. If there are no more entries, returns an undefined value in -scalar context or a null list in list context. +directory. If there are no more entries, returns the undefined value in +scalar context and the empty list in list context. If you're planning to filetest the return values out of a C, you'd better prepend the directory in question. Otherwise, because we didn't @@ -4316,21 +4849,30 @@ C there, it would have been testing the wrong file. @dots = grep { /^\./ && -f "$some_dir/$_" } readdir($dh); closedir $dh; +As of Perl 5.11.2 you can use a bare C in a C loop, +which will set C<$_> on every iteration. + + opendir(my $dh, $some_dir) || die; + while(readdir $dh) { + print "$some_dir/$_\n"; + } + closedir $dh; + =item readline EXPR =item readline X X X Reads from the filehandle whose typeglob is contained in EXPR (or from -*ARGV if EXPR is not provided). In scalar context, each call reads and -returns the next line, until end-of-file is reached, whereupon the -subsequent call returns undef. In list context, reads until end-of-file +C<*ARGV> if EXPR is not provided). In scalar context, each call reads and +returns the next line until end-of-file is reached, whereupon the +subsequent call returns C. In list context, reads until end-of-file is reached and returns a list of lines. Note that the notion of "line" -used here is however you may have defined it with C<$/> or +used here is whatever you may have defined with C<$/> or C<$INPUT_RECORD_SEPARATOR>). See L. -When C<$/> is set to C, when readline() is in scalar -context (i.e. file slurp mode), and when an empty file is read, it +When C<$/> is set to C, when C is in scalar +context (i.e., file slurp mode), and when an empty file is read, it returns C<''> the first time, followed by C subsequently. This is the internal function implementing the C<< >> @@ -4338,21 +4880,31 @@ operator, but you can use it directly. The C<< >> operator is discussed in more detail in L. $line = ; - $line = readline(*STDIN); # same thing + $line = readline(*STDIN); # same thing -If readline encounters an operating system error, C<$!> will be set with the -corresponding error message. It can be helpful to check C<$!> when you are -reading from filehandles you don't trust, such as a tty or a socket. The -following example uses the operator form of C, and takes the necessary -steps to ensure that C was successful. +If C encounters an operating system error, C<$!> will be set +with the corresponding error message. It can be helpful to check +C<$!> when you are reading from filehandles you don't trust, such as a +tty or a socket. The following example uses the operator form of +C and dies if the result is not defined. - for (;;) { - undef $!; - unless (defined( $line = <> )) { - last if eof; - die $! if $!; + while ( ! eof($fh) ) { + defined( $_ = <$fh> ) or die "readline failed: $!"; + ... + } + +Note that you have can't handle C errors that way with the +C filehandle. In that case, you have to open each element of +C<@ARGV> yourself since C handles C differently. + + foreach my $arg (@ARGV) { + open(my $fh, $arg) or warn "Can't open $arg: $!"; + + while ( ! eof($fh) ) { + defined( $_ = <$fh> ) + or die "readline failed for $arg: $!"; + ... } - # ... } =item readlink EXPR @@ -4361,10 +4913,12 @@ X =item readlink Returns the value of a symbolic link, if symbolic links are -implemented. If not, gives a fatal error. If there is some system +implemented. If not, raises an exception. If there is a system error, returns the undefined value and sets C<$!> (errno). If EXPR is omitted, uses C<$_>. +Portability issues: L. + =item readpipe EXPR =item readpipe @@ -4396,7 +4950,7 @@ Note the I: depending on the status of the socket, either (8-bit) bytes or characters are received. By default all sockets operate on bytes, but for example if the socket has been changed using binmode() to operate with the C<:encoding(utf8)> I/O layer (see the -C pragma, L), the I/O will operate on UTF-8 encoded Unicode +C pragma, L), the I/O will operate on UTF8-encoded Unicode characters, not bytes. Similarly for the C<:encoding> pragma: in that case pretty much any characters can be read. @@ -4414,22 +4968,22 @@ normally use this command: # a simpleminded Pascal comment stripper # (warning: assumes no { or } in strings) LINE: while () { - while (s|({.*}.*){.*}|$1 |) {} - s|{.*}| |; - if (s|{.*| |) { - $front = $_; - while () { - if (/}/) { # end of comment? - s|^|$front\{|; - redo LINE; - } - } - } - print; + while (s|({.*}.*){.*}|$1 |) {} + s|{.*}| |; + if (s|{.*| |) { + $front = $_; + while () { + if (/}/) { # end of comment? + s|^|$front\{|; + redo LINE; + } + } + } + print; } -C cannot be used to retry a block which returns a value such as -C, C or C, and should not be used to exit +C cannot be used to retry a block that returns a value such as +C, C, or C, and should not be used to exit a grep() or map() operation. Note that a block by itself is semantically identical to a loop @@ -4466,10 +5020,10 @@ If the referenced object has been blessed into a package, then that package name is returned instead. You can think of C as a C operator. if (ref($r) eq "HASH") { - print "r is a reference to a hash.\n"; + print "r is a reference to a hash.\n"; } unless (ref($r)) { - print "r is not a reference at all.\n"; + print "r is not a reference at all.\n"; } The return value C indicates a reference to an lvalue that is not @@ -4498,6 +5052,8 @@ rename(2) manpage or equivalent system documentation for details. For a platform independent C function look at the L module. +Portability issues: L. + =item require VERSION X @@ -4510,7 +5066,7 @@ specified by EXPR or by C<$_> if EXPR is not supplied. VERSION may be either a numeric argument such as 5.006, which will be compared to C<$]>, or a literal of the form v5.6.1, which will be compared -to C<$^V> (aka $PERL_VERSION). A fatal error is produced at run time if +to C<$^V> (aka $PERL_VERSION). An exception is raised if VERSION is greater than the version of the current Perl interpreter. Compare with L, which can do a similar check at compile time. @@ -4519,9 +5075,9 @@ avoided, because it leads to misleading error messages under earlier versions of Perl that do not support this syntax. The equivalent numeric version should be used instead. - require v5.6.1; # run time version check - require 5.6.1; # ditto - require 5.006_001; # ditto; preferred for backwards compatibility + require v5.6.1; # run time version check + require 5.6.1; # ditto + require 5.006_001; # ditto; preferred for backwards compatibility Otherwise, C demands that a library file be included if it hasn't already been included. The file is included via the do-FILE @@ -4574,7 +5130,7 @@ modules does not risk altering your namespace. In other words, if you try this: - require Foo::Bar; # a splendid bareword + require Foo::Bar; # a splendid bareword The require function will actually look for the "F" file in the directories specified in the C<@INC> array. @@ -4582,32 +5138,32 @@ directories specified in the C<@INC> array. But if you try this: $class = 'Foo::Bar'; - require $class; # $class is not a bareword + require $class; # $class is not a bareword #or - require "Foo::Bar"; # not a bareword because of the "" + require "Foo::Bar"; # not a bareword because of the "" The require function will look for the "F" file in the @INC array and will complain about not finding "F" there. In this case you can do: eval "require $class"; -Now that you understand how C looks for files in the case of a +Now that you understand how C looks for files with a bareword argument, there is a little extra functionality going on behind the scenes. Before C looks for a "F<.pm>" extension, it will first look for a similar filename with a "F<.pmc>" extension. If this file is found, it will be loaded in place of any file ending in a "F<.pm>" extension. -You can also insert hooks into the import facility, by putting directly -Perl code into the @INC array. There are three forms of hooks: subroutine -references, array references and blessed objects. +You can also insert hooks into the import facility by putting Perl code +directly into the @INC array. There are three forms of hooks: subroutine +references, array references, and blessed objects. Subroutine references are the simplest case. When the inclusion system walks through @INC and encounters a subroutine, this subroutine gets -called with two parameters, the first being a reference to itself, and the -second the name of the file to be included (e.g. "F"). The -subroutine should return nothing, or a list of up to three values in the -following order: +called with two parameters, the first a reference to itself, and the +second the name of the file to be included (e.g., "F"). The +subroutine should return either nothing or else a list of up to three +values in the following order: =over @@ -4619,8 +5175,8 @@ A filehandle, from which the file will be read. A reference to a subroutine. If there is no filehandle (previous item), then this subroutine is expected to generate one line of source code per -call, writing the line into C<$_> and returning 1, then returning 0 at -"end of file". If there is a filehandle, then the subroutine will be +call, writing the line into C<$_> and returning 1, then finally at end of +file returning 0. If there is a filehandle, then the subroutine will be called to act as a simple source filter, with the line as read in C<$_>. Again, return 1 for each valid line, and 0 after all lines have been returned. @@ -4633,32 +5189,32 @@ reference to the subroutine itself is passed in as C<$_[0]>. =back If an empty list, C, or nothing that matches the first 3 values above -is returned then C will look at the remaining elements of @INC. -Note that this file handle must be a real file handle (strictly a typeglob, -or reference to a typeglob, blessed or unblessed) - tied file handles will be -ignored and return value processing will stop there. +is returned, then C looks at the remaining elements of @INC. +Note that this filehandle must be a real filehandle (strictly a typeglob +or reference to a typeglob, whether blessed or unblessed); tied filehandles +will be ignored and processing will stop there. If the hook is an array reference, its first element must be a subroutine reference. This subroutine is called as above, but the first parameter is -the array reference. This enables to pass indirectly some arguments to +the array reference. This lets you indirectly pass arguments to the subroutine. In other words, you can write: push @INC, \&my_sub; sub my_sub { - my ($coderef, $filename) = @_; # $coderef is \&my_sub - ... + my ($coderef, $filename) = @_; # $coderef is \&my_sub + ... } or: push @INC, [ \&my_sub, $x, $y, ... ]; sub my_sub { - my ($arrayref, $filename) = @_; - # Retrieve $x, $y, ... - my @parameters = @$arrayref[1..$#$arrayref]; - ... + my ($arrayref, $filename) = @_; + # Retrieve $x, $y, ... + my @parameters = @$arrayref[1..$#$arrayref]; + ... } If the hook is an object, it must provide an INC method that will be @@ -4670,14 +5226,14 @@ into package C
.) Here is a typical code layout: package Foo; sub new { ... } sub Foo::INC { - my ($self, $filename) = @_; - ... + my ($self, $filename) = @_; + ... } # In the main program push @INC, Foo->new(...); -Note that these hooks are also permitted to set the %INC entry +These hooks are also permitted to set the %INC entry corresponding to the files they have loaded. See L. For a yet-more-powerful import facility, see L and L. @@ -4692,17 +5248,17 @@ variables and reset C searches so that they work again. The expression is interpreted as a list of single characters (hyphens allowed for ranges). All variables and arrays beginning with one of those letters are reset to their pristine state. If the expression is -omitted, one-match searches (C) are reset to match again. Resets -only variables or searches in the current package. Always returns +omitted, one-match searches (C) are reset to match again. +Only resets variables or searches in the current package. Always returns 1. Examples: - reset 'X'; # reset all X variables - reset 'a-z'; # reset lower case variables - reset; # just reset ?one-time? searches + reset 'X'; # reset all X variables + reset 'a-z'; # reset lower case variables + reset; # just reset ?one-time? searches Resetting C<"A-Z"> is not recommended because you'll wipe out your C<@ARGV> and C<@INC> arrays and your C<%ENV> hash. Resets only package -variables--lexical variables are unaffected, but they clean themselves +variables; lexical variables are unaffected, but they clean themselves up on scope exit anyway, so you'll probably want to use them instead. See L. @@ -4714,12 +5270,12 @@ X Returns from a subroutine, C, or C with the value given in EXPR. Evaluation of EXPR may be in list, scalar, or void context, depending on how the return value will be used, and the context -may vary from one execution to the next (see C). If no EXPR +may vary from one execution to the next (see L). If no EXPR is given, returns an empty list in list context, the undefined value in -scalar context, and (of course) nothing at all in a void context. +scalar context, and (of course) nothing at all in void context. -(Note that in the absence of an explicit C, a subroutine, eval, -or do FILE will automatically return the value of the last expression +(In the absence of an explicit C, a subroutine, eval, +or do FILE automatically returns the value of the last expression evaluated.) =item reverse LIST @@ -4740,13 +5296,17 @@ Used without arguments in scalar context, reverse() reverses C<$_>. print reverse; # No output, list context print scalar reverse; # Hello, world +Note that reversing an array to itself (as in C<@a = reverse @a>) will +preserve non-existent elements whenever possible, i.e., for non magical +arrays or tied arrays with C and C methods. + This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those can be represented as a key in the inverted hash. Also, this has to unwind one hash and build a whole new one, which may take some time on a large hash, such as from a DBM file. - %by_name = reverse %by_address; # Invert the hash + %by_name = reverse %by_address; # Invert the hash =item rewinddir DIRHANDLE X @@ -4754,6 +5314,8 @@ X Sets the current position to the beginning of the directory for the C routine on DIRHANDLE. +Portability issues: L. + =item rindex STR,SUBSTR,POSITION X @@ -4769,10 +5331,10 @@ X X X =item rmdir Deletes the directory specified by FILENAME if that directory is -empty. If it succeeds it returns true, otherwise it returns false and +empty. If it succeeds it returns true; otherwise it returns false and sets C<$!> (errno). If FILENAME is omitted, uses C<$_>. -To remove a directory tree recursively (C on unix) look at +To remove a directory tree recursively (C on Unix) look at the C function of the L module. =item s/// @@ -4782,16 +5344,21 @@ The substitution operator. See L. =item say FILEHANDLE LIST X +=item say FILEHANDLE + =item say LIST =item say -Just like C, but implicitly appends a newline. -C is simply an abbreviation for C<{ local $\ = "\n"; print -LIST }>. +Just like C, but implicitly appends a newline. C is +simply an abbreviation for C<{ local $\ = "\n"; print LIST }>. To use +FILEHANDLE without a LIST to print the contents of C<$_> to it, you must +use a real filehandle like C, not an indirect one like C<$fh>. -This keyword is only available when the "say" feature is -enabled: see L. +This keyword is available only when the C<"say"> feature +is enabled, or when prefixed with C; see +L. Alternately, include a C or later to the current +scope. =item scalar EXPR X X @@ -4807,19 +5374,19 @@ needed. If you really wanted to do so, however, you could use the construction C<@{[ (some expression) ]}>, but usually a simple C<(some expression)> suffices. -Because C is unary operator, if you accidentally use for EXPR a -parenthesized list, this behaves as a scalar comma expression, evaluating -all but the last element in void context and returning the final element -evaluated in scalar context. This is seldom what you want. +Because C is a unary operator, if you accidentally use a +parenthesized list for the EXPR, this behaves as a scalar comma expression, +evaluating all but the last element in void context and returning the final +element evaluated in scalar context. This is seldom what you want. The following single statement: - print uc(scalar(&foo,$bar)),$baz; + print uc(scalar(&foo,$bar)),$baz; is the moral equivalent of these two: - &foo; - print(uc($bar),$baz); + &foo; + print(uc($bar),$baz); See L for more details on unary operators and the comma operator. @@ -4829,11 +5396,11 @@ X X X Sets FILEHANDLE's position, just like the C call of C. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are C<0> to set the new position -I to POSITION, C<1> to set it to the current position plus -POSITION, and C<2> to set it to EOF plus POSITION (typically -negative). For WHENCE you may use the constants C, +I to POSITION; C<1> to set it to the current position plus +POSITION; and C<2> to set it to EOF plus POSITION, typically +negative. For WHENCE you may use the constants C, C, and C (start of the file, current position, end -of the file) from the Fcntl module. Returns C<1> upon success, C<0> +of the file) from the L module. Returns C<1> on success, false otherwise. Note the I: even if the filehandle has been set to @@ -4841,8 +5408,8 @@ operate on characters (for example by using the C<:encoding(utf8)> open layer), tell() will return byte offsets, not character offsets (because implementing that would render seek() and tell() rather slow). -If you want to position file for C or C, don't use -C--buffering makes its effect on the file's system position +If you want to position the file for C or C, don't use +C, because buffering makes its effect on the file's read-write position unpredictable and non-portable. Use C instead. Due to the rules and rigors of ANSI C, on some systems you have to do a @@ -4853,21 +5420,21 @@ A WHENCE of C<1> (C) is useful for not moving the file position: seek(TEST,0,1); This is also useful for applications emulating C. Once you hit -EOF on your read, and then sleep for a while, you might have to stick in a -seek() to reset things. The C doesn't change the current position, +EOF on your read and then sleep for a while, you (probably) have to stick in a +dummy seek() to reset things. The C doesn't change the position, but it I clear the end-of-file condition on the handle, so that the -next C<< >> makes Perl try again to read something. We hope. +next C<< >> makes Perl try again to read something. (We hope.) -If that doesn't work (some IO implementations are particularly -cantankerous), then you may need something more like this: +If that doesn't work (some I/O implementations are particularly +cantankerous), you might need something like this: for (;;) { - for ($curpos = tell(FILE); $_ = ; + for ($curpos = tell(FILE); $_ = ; $curpos = tell(FILE)) { - # search for some stuff and put it into files - } - sleep($for_a_while); - seek(FILE, $curpos, 0); + # search for some stuff and put it into files + } + sleep($for_a_while); + seek(FILE, $curpos, 0); } =item seekdir DIRHANDLE,POS @@ -4885,11 +5452,12 @@ X -This calls the select(2) system call with the bit masks specified, which +This calls the select(2) syscall with the bit masks specified, which can be constructed using C and C, along these lines: $rin = $win = $ein = ''; - vec($rin,fileno(STDIN),1) = 1; - vec($win,fileno(STDOUT),1) = 1; + vec($rin, fileno(STDIN), 1) = 1; + vec($win, fileno(STDOUT), 1) = 1; $ein = $rin | $win; -If you want to select on many filehandles you might wish to write a -subroutine: +If you want to select on many filehandles, you may wish to write a +subroutine like this: sub fhbits { - my(@fhlist) = split(' ',$_[0]); - my($bits); - for (@fhlist) { - vec($bits,fileno($_),1) = 1; - } - $bits; + my @fhlist = @_; + my $bits = ""; + for my $fh (@fhlist) { + vec($bits, fileno($fh), 1) = 1; + } + return $bits; } - $rin = fhbits('STDIN TTY SOCK'); + $rin = fhbits(*STDIN, *TTY, *MYSOCK); The usual idiom is: @@ -4956,23 +5526,27 @@ Note that whether C. -On error, C behaves just like select(2): it returns -1 and sets C<$!>. -Note: on some Unixes, the select(2) system call may report a socket file -descriptor as "ready for reading", when actually no data is available, -thus a subsequent read blocks. It can be avoided using always the -O_NONBLOCK flag on the socket. See select(2) and fcntl(2) for further -details. +On some Unixes, select(2) may report a socket file descriptor as "ready for +reading" even when no data is available, and thus any subsequent C +would block. This can be avoided if you always use O_NONBLOCK on the +socket. See select(2) and fcntl(2) for further details. + +The standard C module provides a user-friendlier interface +to C, except as permitted by POSIX, and even then only on POSIX systems. You have to use C instead. +Portability issues: L. + =item semctl ID,SEMNUM,CMD,ARG X -Calls the System V IPC function C. You'll probably have to say +Calls the System V IPC function semctl(2). You'll probably have to say use IPC::SysV; @@ -4985,23 +5559,27 @@ short integers, which may be created with C. See also L, C, C documentation. +Portability issues: L. + =item semget KEY,NSEMS,FLAGS X -Calls the System V IPC function semget. Returns the semaphore id, or -the undefined value if there is an error. See also +Calls the System V IPC function semget(2). Returns the semaphore id, or +the undefined value on error. See also L, C, C documentation. +Portability issues: L. + =item semop KEY,OPSTRING X -Calls the System V IPC function semop to perform semaphore operations +Calls the System V IPC function semop(2) for semaphore operations such as signalling and waiting. OPSTRING must be a packed array of semop structures. Each semop structure can be generated with C. The length of OPSTRING implies the number of semaphore operations. Returns true if -successful, or false if there is an error. As an example, the +successful, false on error. As an example, the following code waits on semaphore $semnum of semaphore id $semid: $semop = pack("s!3", $semnum, -1, 0); @@ -5011,18 +5589,19 @@ To signal the semaphore, replace C<-1> with C<1>. See also L, C, and C documentation. +Portability issues: L. + =item send SOCKET,MSG,FLAGS,TO X =item send SOCKET,MSG,FLAGS -Sends a message on a socket. Attempts to send the scalar MSG to the -SOCKET filehandle. Takes the same flags as the system call of the -same name. On unconnected sockets you must specify a destination to -send TO, in which case it does a C C. Returns the number of -characters sent, or the undefined value if there is an error. The C -system call sendmsg(2) is currently unimplemented. See -L for examples. +Sends a message on a socket. Attempts to send the scalar MSG to the SOCKET +filehandle. Takes the same flags as the system call of the same name. On +unconnected sockets, you must specify a destination to I, in which +case it does a sendto(2) syscall. Returns the number of characters sent, +or the undefined value on error. The sendmsg(2) syscall is currently +unimplemented. See L for examples. Note the I: depending on the status of the socket, either (8-bit) bytes or characters are sent. By default all sockets operate @@ -5036,45 +5615,58 @@ pragma: in that case pretty much any characters can be sent. X X Sets the current process group for the specified PID, C<0> for the current -process. Will produce a fatal error if used on a machine that doesn't +process. Raises an exception when used on a machine that doesn't implement POSIX setpgid(2) or BSD setpgrp(2). If the arguments are omitted, it defaults to C<0,0>. Note that the BSD 4.2 version of C does not accept any arguments, so only C is portable. See also C. +Portability issues: L. + =item setpriority WHICH,WHO,PRIORITY X X X X Sets the current priority for a process, a process group, or a user. -(See setpriority(2).) Will produce a fatal error if used on a machine +(See setpriority(2).) Raises an exception when used on a machine that doesn't implement setpriority(2). +Portability issues: L. + =item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL X -Sets the socket option requested. Returns undefined if there is an -error. Use integer constants provided by the C module for +Sets the socket option requested. Returns C on error. +Use integer constants provided by the C module for LEVEL and OPNAME. Values for LEVEL can also be obtained from getprotobyname. OPTVAL might either be a packed string or an integer. An integer OPTVAL is shorthand for pack("i", OPTVAL). -An example disabling the Nagle's algorithm for a socket: +An example disabling Nagle's algorithm on a socket: use Socket qw(IPPROTO_TCP TCP_NODELAY); setsockopt($socket, IPPROTO_TCP, TCP_NODELAY, 1); +Portability issues: L. + =item shift ARRAY X +=item shift EXPR + =item shift Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the C<@_> array within the lexical scope of subroutines and formats, and the -C<@ARGV> array outside of a subroutine and also within the lexical scopes +C<@ARGV> array outside a subroutine and also within the lexical scopes established by the C, C, C, C, -C and C constructs. +C, and C constructs. + +Starting with Perl 5.14, C can take a scalar EXPR, which must hold a +reference to an unblessed array. The argument will be dereferenced +automatically. This aspect of C is considered highly experimental. +The exact behaviour may change in a future version of Perl. See also C, C, and C. C and C do the same thing to the left end of an array that C and C do to the @@ -5089,17 +5681,21 @@ Calls the System V IPC function shmctl. You'll probably have to say first to get the correct constant definitions. If CMD is C, then ARG must be a variable that will hold the returned C -structure. Returns like ioctl: the undefined value for error, "C<0> but -true" for zero, or the actual return value otherwise. +structure. Returns like ioctl: C for error; "C<0> but +true" for zero; and the actual return value otherwise. See also L and C documentation. +Portability issues: L. + =item shmget KEY,SIZE,FLAGS X Calls the System V IPC function shmget. Returns the shared memory -segment id, or the undefined value if there is an error. +segment id, or C on error. See also L and C documentation. +Portability issues: L. + =item shmread ID,VAR,POS,SIZE X X @@ -5111,15 +5707,17 @@ position POS for size SIZE by attaching to it, copying in/out, and detaching from it. When reading, VAR must be a variable that will hold the data read. When writing, if STRING is too long, only SIZE bytes are used; if STRING is too short, nulls are written to fill out -SIZE bytes. Return true if successful, or false if there is an error. +SIZE bytes. Return true if successful, false on error. shmread() taints the variable. See also L, -C documentation, and the C module from CPAN. +C, and the C module from CPAN. + +Portability issues: L and L. =item shutdown SOCKET,HOW X Shuts down a socket connection in the manner indicated by HOW, which -has the same interpretation as in the system call of the same name. +has the same interpretation as in the syscall of the same name. shutdown(SOCKET, 0); # I/we have stopped reading data shutdown(SOCKET, 1); # I/we have stopped writing data @@ -5131,7 +5729,7 @@ It's also a more insistent form of close because it also disables the file descriptor in any forked copies in other processes. -Returns C<1> for success. In the case of error, returns C if +Returns C<1> for success; on error, returns C if the first argument is not a valid filehandle, or returns C<0> and sets C<$!> for any other failure. @@ -5153,8 +5751,8 @@ X X =item sleep -Causes the script to sleep for EXPR seconds, or forever if no EXPR. -Returns the number of seconds actually slept. +Causes the script to sleep for (integer) EXPR seconds, or forever if no +argument is given. Returns the integer number of seconds actually slept. May be interrupted if the process receives a signal such as C. @@ -5187,7 +5785,7 @@ X Opens a socket of the specified kind and attaches it to filehandle SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as for -the system call of the same name. You should C first +the syscall of the same name. You should C first to get the proper definitions imported. See the examples in L. @@ -5200,8 +5798,8 @@ X Creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as -for the system call of the same name. If unimplemented, yields a fatal -error. Returns true if successful. +for the syscall of the same name. If unimplemented, raises an exception. +Returns true if successful. On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptors, as determined by the value @@ -5219,6 +5817,8 @@ See L for an example of socketpair use. Perl 5.8 and later will emulate socketpair using IP sockets to localhost if your system implements sockets but not socketpair. +Portability issues: L. + =item sort SUBNAME LIST X X X X @@ -5232,20 +5832,19 @@ In scalar context, the behaviour of C is undefined. If SUBNAME or BLOCK is omitted, Cs in standard string comparison order. If SUBNAME is specified, it gives the name of a subroutine that returns an integer less than, equal to, or greater than C<0>, -depending on how the elements of the list are to be ordered. (The C<< -<=> >> and C operators are extremely useful in such routines.) +depending on how the elements of the list are to be ordered. (The +C<< <=> >> and C operators are extremely useful in such routines.) SUBNAME may be a scalar variable name (unsubscripted), in which case the value provides the name of (or a reference to) the actual subroutine to use. In place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine. -If the subroutine's prototype is C<($$)>, the elements to be compared -are passed by reference in C<@_>, as for a normal subroutine. This is -slower than unprototyped subroutines, where the elements to be -compared are passed into the subroutine -as the package global variables $a and $b (see example below). Note that -in the latter case, it is usually counter-productive to declare $a and -$b as lexicals. +If the subroutine's prototype is C<($$)>, the elements to be compared are +passed by reference in C<@_>, as for a normal subroutine. This is slower +than unprototyped subroutines, where the elements to be compared are passed +into the subroutine as the package global variables $a and $b (see example +below). Note that in the latter case, it is usually highly counter-productive +to declare $a and $b as lexicals. The values to be compared are always passed by reference and should not be modified. @@ -5263,7 +5862,7 @@ actually modifies the element in the original list. This is usually something to be avoided when writing clear code. Perl 5.6 and earlier used a quicksort algorithm to implement sort. -That algorithm was not stable, and I go quadratic. (A I sort +That algorithm was not stable, so I go quadratic. (A I sort preserves the input order of elements that compare equal. Although quicksort's run time is O(NlogN) when averaged over all arrays of length N, the time can be O(N**2), I behavior, for some @@ -5280,87 +5879,87 @@ Examples: # sort lexically @articles = sort @files; - + # same thing, but with explicit sort routine @articles = sort {$a cmp $b} @files; - + # now case-insensitively @articles = sort {uc($a) cmp uc($b)} @files; - + # same thing in reversed order @articles = sort {$b cmp $a} @files; - + # sort numerically ascending @articles = sort {$a <=> $b} @files; - + # sort numerically descending @articles = sort {$b <=> $a} @files; - + # this sorts the %age hash by value instead of key # using an in-line function @eldest = sort { $age{$b} <=> $age{$a} } keys %age; - + # sort using explicit subroutine name sub byage { - $age{$a} <=> $age{$b}; # presuming numeric + $age{$a} <=> $age{$b}; # presuming numeric } @sortedclass = sort byage @class; - + sub backwards { $b cmp $a } @harry = qw(dog cat x Cain Abel); @george = qw(gone chased yz Punished Axed); print sort @harry; - # prints AbelCaincatdogx + # prints AbelCaincatdogx print sort backwards @harry; - # prints xdogcatCainAbel + # prints xdogcatCainAbel print sort @george, 'to', @harry; - # prints AbelAxedCainPunishedcatchaseddoggonetoxyz + # prints AbelAxedCainPunishedcatchaseddoggonetoxyz # inefficiently sort by descending numeric compare using # the first integer after the first = sign, or the # whole record case-insensitively otherwise - @new = sort { - ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] - || - uc($a) cmp uc($b) + my @new = sort { + ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] + || + uc($a) cmp uc($b) } @old; # same thing, but much more efficiently; # we'll build auxiliary indices instead # for speed - @nums = @caps = (); + my @nums = @caps = (); for (@old) { - push @nums, /=(\d+)/; - push @caps, uc($_); + push @nums, ( /=(\d+)/ ? $1 : undef ); + push @caps, uc($_); } - @new = @old[ sort { - $nums[$b] <=> $nums[$a] - || - $caps[$a] cmp $caps[$b] - } 0..$#old - ]; + my @new = @old[ sort { + $nums[$b] <=> $nums[$a] + || + $caps[$a] cmp $caps[$b] + } 0..$#old + ]; # same thing, but without any temps @new = map { $_->[0] } sort { $b->[1] <=> $a->[1] - || - $a->[2] cmp $b->[2] - } map { [$_, /=(\d+)/, uc($_)] } @old; + || + $a->[2] cmp $b->[2] + } map { [$_, /=(\d+)/, uc($_)] } @old; # using a prototype allows you to use any comparison subroutine # as a sort subroutine (including other package's subroutines) package other; - sub backwards ($$) { $_[1] cmp $_[0]; } # $a and $b are not set here - + sub backwards ($$) { $_[1] cmp $_[0]; } # $a and $b are not set here + package main; @new = sort other::backwards @old; - + # guarantee stability, regardless of algorithm use sort 'stable'; @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @old; - + # force use of mergesort (not portable outside Perl 5.8) use sort '_mergesort'; # note discouraging _ @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @old; @@ -5399,22 +5998,22 @@ sometimes saying the opposite, for example) the results are not well-defined. Because C<< <=> >> returns C when either operand is C -(not-a-number), and because C will trigger a fatal error unless the -result of a comparison is defined, when sorting with a comparison function -like C<< $a <=> $b >>, be careful about lists that might contain a C. -The following example takes advantage of the fact that C to -eliminate any Cs from the input. +(not-a-number), and laso because C raises an exception unless the +result of a comparison is defined, be careful when sorting with a +comparison function like C<< $a <=> $b >> any lists that might contain a +C. The following example takes advantage that C to +eliminate any Cs from the input list. @result = sort { $a <=> $b } grep { $_ == $_ } @input; -=item splice ARRAY,OFFSET,LENGTH,LIST +=item splice ARRAY or EXPR,OFFSET,LENGTH,LIST X -=item splice ARRAY,OFFSET,LENGTH +=item splice ARRAY or EXPR,OFFSET,LENGTH -=item splice ARRAY,OFFSET +=item splice ARRAY or EXPR,OFFSET -=item splice ARRAY +=item splice ARRAY or EXPR Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any. In list context, @@ -5426,30 +6025,35 @@ If LENGTH is omitted, removes everything from OFFSET onward. If LENGTH is negative, removes the elements from OFFSET onward except for -LENGTH elements at the end of the array. If both OFFSET and LENGTH are omitted, removes everything. If OFFSET is -past the end of the array, perl issues a warning, and splices at the +past the end of the array, Perl issues a warning, and splices at the end of the array. The following equivalences hold (assuming C<< $[ == 0 and $#a >= $i >> ) - push(@a,$x,$y) splice(@a,@a,0,$x,$y) - pop(@a) splice(@a,-1) - shift(@a) splice(@a,0,1) - unshift(@a,$x,$y) splice(@a,0,0,$x,$y) - $a[$i] = $y splice(@a,$i,1,$y) + push(@a,$x,$y) splice(@a,@a,0,$x,$y) + pop(@a) splice(@a,-1) + shift(@a) splice(@a,0,1) + unshift(@a,$x,$y) splice(@a,0,0,$x,$y) + $a[$i] = $y splice(@a,$i,1,$y) Example, assuming array lengths are passed before arrays: - sub aeq { # compare two list values - my(@a) = splice(@_,0,shift); - my(@b) = splice(@_,0,shift); - return 0 unless @a == @b; # same len? - while (@a) { - return 0 if pop(@a) ne pop(@b); - } - return 1; + sub aeq { # compare two list values + my(@a) = splice(@_,0,shift); + my(@b) = splice(@_,0,shift); + return 0 unless @a == @b; # same len? + while (@a) { + return 0 if pop(@a) ne pop(@b); + } + return 1; } if (&aeq($len,@foo[1..$len],0+@bar,@bar)) { ... } +Starting with Perl 5.14, C can take scalar EXPR, which must hold a +reference to an unblessed array. The argument will be dereferenced +automatically. This aspect of C is considered highly experimental. +The exact behaviour may change in a future version of Perl. + =item split /PATTERN/,EXPR,LIMIT X @@ -5463,9 +6067,7 @@ Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.) -In scalar context, returns the number of fields found and splits into -the C<@_> array. Use of split in scalar context is deprecated, however, -because it clobbers your subroutine arguments. +In scalar context, returns the number of fields found. If EXPR is omitted, splits the C<$_> string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything @@ -5482,19 +6084,19 @@ had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified. -A pattern matching the null string (not to be confused with -a null pattern C, which is just one member of the set of patterns -matching a null string) will split the value of EXPR into separate -characters at each point it matches that way. For example: +A pattern matching the empty string (not to be confused with +an empty pattern C, which is just one member of the set of patterns +matching the epmty string), splits EXPR into individual +characters. For example: print join(':', split(/ */, 'hi there')), "\n"; produces the output 'h:i:t:h:e:r:e'. -As a special case for C, using the empty pattern C specifically -matches only the null string, and is not be confused with the regular use -of C to mean "the last successful pattern match". So, for C, -the following: +As a special case for C, the empty pattern C specifically +matches the empty string; this is not be confused with the normal use +of an empty pattern to mean the last successful match. So to split +a string into individual characters, the following: print join(':', split(//, 'hi there')), "\n"; @@ -5549,7 +6151,7 @@ use C.) As a special case, specifying a PATTERN of space (S>) will split on white space just as C with no arguments does. Thus, S> can be used to emulate B's default behavior, whereas S> -will give you as many null initial fields as there are leading spaces. +will give you as many initial null fields (empty string) as there are leading spaces. A C on C is like a S> except that any leading whitespace produces a null first field. A C with no arguments really does a S> internally. @@ -5564,7 +6166,7 @@ Example: chomp; ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split(/:/); - #... + #... } As with regular pattern matching, any capturing parentheses that are not @@ -5578,7 +6180,7 @@ X Returns a string formatted by the usual C conventions of the C library function C. See below for more details -and see C or C on your system for an explanation of +and see L or L on your system for an explanation of the general principles. For example: @@ -5589,11 +6191,11 @@ For example: # Round number to 3 digits after decimal point $rounded = sprintf("%.3f", $number); -Perl does its own C formatting--it emulates the C -function C, but it doesn't use it (except for floating-point -numbers, and even then only the standard modifiers are allowed). As a -result, any non-standard extensions in your local C are not -available from Perl. +Perl does its own C formatting: it emulates the C +function sprintf(3), but doesn't use it except for floating-point +numbers, and even then only standard modifiers are allowed. +Non-standard extensions in your local sprintf(3) are +therefore unavailable from Perl. Unlike C, C does not do what you probably mean when you pass it an array as your first argument. The array is given scalar context, @@ -5603,36 +6205,36 @@ useful. Perl's C permits the following universally-known conversions: - %% a percent sign - %c a character with the given number - %s a string - %d a signed integer, in decimal - %u an unsigned integer, in decimal - %o an unsigned integer, in octal - %x an unsigned integer, in hexadecimal - %e a floating-point number, in scientific notation - %f a floating-point number, in fixed decimal notation - %g a floating-point number, in %e or %f notation + %% a percent sign + %c a character with the given number + %s a string + %d a signed integer, in decimal + %u an unsigned integer, in decimal + %o an unsigned integer, in octal + %x an unsigned integer, in hexadecimal + %e a floating-point number, in scientific notation + %f a floating-point number, in fixed decimal notation + %g a floating-point number, in %e or %f notation In addition, Perl permits the following widely-supported conversions: - %X like %x, but using upper-case letters - %E like %e, but using an upper-case "E" - %G like %g, but with an upper-case "E" (if applicable) - %b an unsigned integer, in binary - %B like %b, but using an upper-case "B" with the # flag - %p a pointer (outputs the Perl value's address in hexadecimal) - %n special: *stores* the number of characters output so far + %X like %x, but using upper-case letters + %E like %e, but using an upper-case "E" + %G like %g, but with an upper-case "E" (if applicable) + %b an unsigned integer, in binary + %B like %b, but using an upper-case "B" with the # flag + %p a pointer (outputs the Perl value's address in hexadecimal) + %n special: *stores* the number of characters output so far into the next variable in the parameter list Finally, for backward (and we do mean "backward") compatibility, Perl permits these unnecessary but widely-supported conversions: - %i a synonym for %d - %D a synonym for %ld - %U a synonym for %lu - %O a synonym for %lo - %F a synonym for %f + %i a synonym for %d + %D a synonym for %ld + %U a synonym for %lu + %O a synonym for %lo + %F a synonym for %f Note that the number of exponent digits in the scientific notation produced by C<%e>, C<%E>, C<%g> and C<%G> for numbers with the modulus of the @@ -5640,7 +6242,7 @@ exponent less than 100 is system-dependent: it may be three or less (zero-padded as necessary). In other words, 1.23 times ten to the 99th may be either "1.23e99" or "1.23e099". -Between the C<%> and the format letter, you may specify a number of +Between the C<%> and the format letter, you may specify several additional attributes controlling the interpretation of the format. In order, these are: @@ -5650,7 +6252,7 @@ In order, these are: An explicit format parameter index, such as C<2$>. By default sprintf will format the next unused argument in the list, but this allows you -to take the arguments out of order, e.g.: +to take the arguments out of order: printf '%2$d %1$d', 12, 34; # prints "34 12" printf '%3$d %d %1$d', 1, 2, 3; # prints "3 1 1" @@ -5695,7 +6297,7 @@ the precision is incremented if it's necessary for the leading "0". =item vector flag -This flag tells perl to interpret the supplied string as a vector of +This flag tells Perl to interpret the supplied string as a vector of integers, one for each character in the string. Perl applies the format to each integer in turn, then joins the resulting strings with a separator (a dot C<.> by default). This can be useful for displaying ordinal values of @@ -5711,7 +6313,7 @@ use to separate the numbers: printf "bits are %0*v8b\n", " ", $bits; # random bitstring You can also explicitly specify the argument number to use for -the join string using e.g. C<*2$v>: +the join string using something like C<*2$v>; for example: printf '%*4$vX %*4$vX %*4$vX', @addr[1..3], ":"; # 3 IPv6 addresses @@ -5720,13 +6322,13 @@ the join string using e.g. C<*2$v>: Arguments are usually formatted to be only as wide as required to display the given value. You can override the width by putting a number here, or get the width from the next argument (with C<*>) -or from a specified argument (with e.g. C<*2$>): +or from a specified argument (e.g., with C<*2$>): - printf '<%s>', "a"; # prints "" - printf '<%6s>', "a"; # prints "< a>" - printf '<%*s>', 6, "a"; # prints "< a>" - printf '<%*2$s>', "a", 6; # prints "< a>" - printf '<%2s>', "long"; # prints "" (does not truncate) + printf "<%s>", "a"; # prints "" + printf "<%6s>", "a"; # prints "< a>" + printf "<%*s>", 6, "a"; # prints "< a>" + printf "<%*2$s>", "a", 6; # prints "< a>" + printf "<%2s>", "long"; # prints "" (does not truncate) If a field width obtained through C<*> is negative, it has the same effect as the C<-> flag: left-justification. @@ -5736,8 +6338,9 @@ X You can specify a precision (for numeric conversions) or a maximum width (for string conversions) by specifying a C<.> followed by a number. -For floating point formats, with the exception of 'g' and 'G', this specifies -the number of decimal places to show (the default being 6), e.g.: +For floating-point formats except C and C, this specifies +how many places right of the decimal point to show (the default being 6). +For example: # these examples are subject to system-specific variation printf '<%f>', 1; # prints "<1.000000>" @@ -5746,10 +6349,11 @@ the number of decimal places to show (the default being 6), e.g.: printf '<%e>', 10; # prints "<1.000000e+01>" printf '<%.1e>', 10; # prints "<1.0e+01>" -For 'g' and 'G', this specifies the maximum number of digits to show, -including prior to the decimal point as well as after it, e.g.: +For "g" and "G", this specifies the maximum number of digits to show, +including thoe prior to the decimal point and those after it; for +example: - # these examples are subject to system-specific variation + # These examples are subject to system-specific variation. printf '<%g>', 1; # prints "<1>" printf '<%.10g>', 1; # prints "<1>" printf '<%g>', 100; # prints "<100>" @@ -5777,7 +6381,7 @@ where the 0 flag is ignored: printf '<%#10.6x>', 1; # prints "< 0x000001>" For string conversions, specifying a precision truncates the string -to fit in the specified width: +to fit the specified width: printf '<%.5s>', "truncated"; # prints "" printf '<%10.5s>', "truncated"; # prints "< trunc>" @@ -5787,8 +6391,8 @@ You can also get the precision from the next argument using C<.*>: printf '<%.6x>', 1; # prints "<000001>" printf '<%.*x>', 6, 1; # prints "<000001>" -If a precision obtained through C<*> is negative, it has the same -effect as no precision. +If a precision obtained through C<*> is negative, it counts +as having no precision at all. printf '<%.*s>', 7, "string"; # prints "" printf '<%.*s>', 3, "string"; # prints "" @@ -5800,10 +6404,10 @@ effect as no precision. printf '<%.*d>', -1, 0; # prints "<0>" You cannot currently get the precision from a specified number, -but it is intended that this will be possible in the future using -e.g. C<.*2$>: +but it is intended that this will be possible in the future, for +example using C<.*2$>: - printf '<%.*2$x>', 1, 6; # INVALID, but in future will print "<000001>" + printf "<%.*2$x>", 1, 6; # INVALID, but in future will print "<000001>" =item size @@ -5814,83 +6418,99 @@ whatever the default integer size is on your platform (usually 32 or 64 bits), but you can override this to use instead one of the standard C types, as supported by the compiler used to build Perl: - l interpret integer as C type "long" or "unsigned long" + hh interpret integer as C type "char" or "unsigned char" + on Perl 5.14 or later h interpret integer as C type "short" or "unsigned short" - q, L or ll interpret integer as C type "long long", "unsigned long long". - or "quads" (typically 64-bit integers) + j interpret integer as C type "intmax_t" on Perl 5.14 + or later, and only with a C99 compiler (unportable) + l interpret integer as C type "long" or "unsigned long" + q, L, or ll interpret integer as C type "long long", "unsigned long long", + or "quad" (typically 64-bit integers) + t interpret integer as C type "ptrdiff_t" on Perl 5.14 or later + z interpret integer as C type "size_t" on Perl 5.14 or later + +As of 5.14, none of these raises an exception if they are not supported on +your platform. However, if warnings are enabled, a warning of the +C warning class is issued on an unsupported conversion flag. +Should you instead prefer an exception, do this: -The last will produce errors if Perl does not understand "quads" in your -installation. (This requires that either the platform natively supports quads -or Perl was specifically compiled to support quads.) You can find out -whether your Perl supports quads via L: + use warnings FATAL => "printf"; - use Config; - ($Config{use64bitint} eq 'define' || $Config{longsize} >= 8) && - print "quads\n"; +If you would like to know about a version dependency before you +start running the program, put something like this at its top: -For floating point conversions (C), numbers are usually assumed -to be the default floating point size on your platform (double or long double), -but you can force 'long double' with C, C, or C if your + use 5.014; # for hh/j/t/z/ printf modifiers + +You can find out whether your Perl supports quads via L: + + use Config; + if ($Config{use64bitint} eq "define" || $Config{longsize} >= 8) { + print "Nice quads!\n"; + } + +For floating-point conversions (C), numbers are usually assumed +to be the default floating-point size on your platform (double or long double), +but you can force "long double" with C, C, or C if your platform supports them. You can find out whether your Perl supports long doubles via L: - use Config; - $Config{d_longdbl} eq 'define' && print "long doubles\n"; + use Config; + print "long doubles\n" if $Config{d_longdbl} eq "define"; -You can find out whether Perl considers 'long double' to be the default -floating point size to use on your platform via L: +You can find out whether Perl considers "long double" to be the default +floating-point size to use on your platform via L: - use Config; - ($Config{uselongdouble} eq 'define') && - print "long doubles by default\n"; + use Config; + if ($Config{uselongdouble} eq "define") { + print "long doubles by default\n"; + } -It can also be the case that long doubles and doubles are the same thing: +It can also be that long doubles and doubles are the same thing: use Config; ($Config{doublesize} == $Config{longdblsize}) && print "doubles are long doubles\n"; -The size specifier C has no effect for Perl code, but it is supported -for compatibility with XS code; it means 'use the standard size for -a Perl integer (or floating-point number)', which is already the -default for Perl code. +The size specifier C has no effect for Perl code, but is supported for +compatibility with XS code. It means "use the standard size for a Perl +integer or floating-point number", which is the default. =item order of arguments -Normally, sprintf takes the next unused argument as the value to +Normally, sprintf() takes the next unused argument as the value to format for each format specification. If the format specification uses C<*> to require additional arguments, these are consumed from -the argument list in the order in which they appear in the format -specification I the value to format. Where an argument is -specified using an explicit index, this does not affect the normal -order for the arguments (even when the explicitly specified index -would have been the next argument in any case). +the argument list in the order they appear in the format +specification I the value to format. Where an argument is +specified by an explicit index, this does not affect the normal +order for the arguments, even when the explicitly specified index +would have been the next argument. So: - printf '<%*.*s>', $a, $b, $c; + printf "<%*.*s>", $a, $b, $c; -would use C<$a> for the width, C<$b> for the precision and C<$c> -as the value to format, while: +uses C<$a> for the width, C<$b> for the precision, and C<$c> +as the value to format; while: - printf '<%*1$.*s>', $a, $b; + printf "<%*1$.*s>", $a, $b; -would use C<$a> for the width and the precision, and C<$b> as the +would use C<$a> for the width and precision, and C<$b> as the value to format. -Here are some more examples - beware that when using an explicit -index, the C<$> may need to be escaped: +Here are some more examples; be aware that when using an explicit +index, the C<$> may need escaping: - printf "%2\$d %d\n", 12, 34; # will print "34 12\n" - printf "%2\$d %d %d\n", 12, 34; # will print "34 12 34\n" - printf "%3\$d %d %d\n", 12, 34, 56; # will print "56 12 34\n" - printf "%2\$*3\$d %d\n", 12, 34, 3; # will print " 34 12\n" + printf "%2\$d %d\n", 12, 34; # will print "34 12\n" + printf "%2\$d %d %d\n", 12, 34; # will print "34 12 34\n" + printf "%3\$d %d %d\n", 12, 34, 56; # will print "56 12 34\n" + printf "%2\$*3\$d %d\n", 12, 34, 3; # will print " 34 12\n" =back -If C is in effect, and POSIX::setlocale() has been called, -the character used for the decimal separator in formatted floating -point numbers is affected by the LC_NUMERIC locale. See L +If C is in effect and POSIX::setlocale() has been called, +the character used for the decimal separator in formatted floating-point +numbers is affected by the LC_NUMERIC locale. See L and L. =item sqrt EXPR @@ -5898,44 +6518,46 @@ X X X =item sqrt -Return the square root of EXPR. If EXPR is omitted, returns square -root of C<$_>. Only works on non-negative operands, unless you've -loaded the standard Math::Complex module. +Return the positive square root of EXPR. If EXPR is omitted, uses +C<$_>. Works only for non-negative operands unless you've +loaded the C module. use Math::Complex; - print sqrt(-2); # prints 1.4142135623731i + print sqrt(-4); # prints 2i =item srand EXPR X X X =item srand -Sets the random number seed for the C operator. +Sets and returns the random number seed for the C operator. The point of the function is to "seed" the C function so that C can produce a different sequence each time you run your -program. - -If srand() is not called explicitly, it is called implicitly at the -first use of the C operator. However, this was not the case in -versions of Perl before 5.004, so if your script will run under older -Perl versions, it should call C. - -Most programs won't even call srand() at all, except those that -need a cryptographically-strong starting point rather than the -generally acceptable default, which is based on time of day, -process ID, and memory allocation, or the F device, -if available. - -You can call srand($seed) with the same $seed to reproduce the -I sequence from rand(), but this is usually reserved for -generating predictable results for testing or debugging. -Otherwise, don't call srand() more than once in your program. - -Do B call srand() (i.e. without an argument) more than once in -a script. The internal state of the random number generator should +program. When called with a parameter, C uses that for the seed; +otherwise it (semi-)randomly chooses a seed. In either case, starting with +Perl 5.14, it returns the seed. + +If C is not called explicitly, it is called implicitly without a +parameter at the first use of the C operator. However, this was not true +of versions of Perl before 5.004, so if your script will run under older +Perl versions, it should call C; otherwise most programs won't call +C at all. + +But there are a few situations in recent Perls where programs are likely to +want to call C. One is for generating predictable results generally for +testing or debugging. There, you use C, with the same C<$seed> +each time. Another other case is where you need a cryptographically-strong +starting point rather than the generally acceptable default, which is based on +time of day, process ID, and memory allocation, or the F device +if available. And still another case is that you may want to call C +after a C to avoid child processes sharing the same seed value as the +parent (and consequently each other). + +Do B call C (i.e., without an argument) more than once per +process. The internal state of the random number generator should contain more entropy than can be provided by any seed, so calling -srand() again actually I randomness. +C again actually I randomness. Most implementations of C take an integer and will silently truncate decimal numbers. This means C will usually @@ -5954,8 +6576,8 @@ example: srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`); -If you're particularly concerned with this, see the C -module in CPAN. +If you're particularly concerned with this, search the CPAN for +random number generator modules instead of rolling out your own. Frequently called programs (like CGI scripts) that simply use @@ -5967,6 +6589,11 @@ for a seed can fall prey to the mathematical property that one-third of the time. So don't do that. +A typical use of the returned seed is for a test program which has too many +combinations to test comprehensively in the time available to it each run. It +can test a random subset each time, and should there be a failure, log the seed +used for that run so that it can later be used to reproduce the same results. + =item stat FILEHANDLE X X X @@ -5978,7 +6605,7 @@ X X X Returns a 13-element list giving the status info for a file, either the file opened via FILEHANDLE or DIRHANDLE, or named by EXPR. If EXPR is -omitted, it stats C<$_>. Returns a null list if the stat fails. Typically +omitted, it stats C<$_> (not C<_>!). Returns the empty list if C fails. Typically used as follows: ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, @@ -6006,14 +6633,14 @@ meanings of the fields: (*) Not all fields are supported on all filesystem types. Notably, the ctime field is non-portable. In particular, you cannot expect it to be a -"creation time", see L for details. +"creation time"; see L for details. If C is passed the special filehandle consisting of an underline, no stat is done, but the current contents of the stat structure from the last C, C, or filetest are returned. Example: if (-x $file && (($d) = stat(_)) && $d < 0) { - print "$file is executable NFS file\n"; + print "$file is executable NFS file\n"; } (This works on machines only for which the device number is negative @@ -6035,8 +6662,8 @@ The L module provides a convenient, by-name access mechanism: use File::stat; $sb = stat($filename); printf "File is %s, size is %s, perm %04o, mtime %s\n", - $filename, $sb->size, $sb->mode & 07777, - scalar localtime $sb->mtime; + $filename, $sb->size, $sb->mode & 07777, + scalar localtime $sb->mtime; You can import symbolic mode constants (C) and functions (C) from the Fcntl module: @@ -6055,7 +6682,7 @@ You can import symbolic mode constants (C) and functions $is_directory = S_ISDIR($mode); You could write the last two using the C<-u> and C<-d> operators. -The commonly available C constants are +Commonly available C constants are: # Permissions: read, write, execute, for user, group, others. @@ -6078,11 +6705,11 @@ The commonly available C constants are and the C functions are - S_IMODE($mode) the part of $mode containing the permission bits - and the setuid/setgid/sticky bits + S_IMODE($mode) the part of $mode containing the permission bits + and the setuid/setgid/sticky bits - S_IFMT($mode) the part of $mode containing the file type - which can be bit-anded with e.g. S_IFREG + S_IFMT($mode) the part of $mode containing the file type + which can be bit-anded with (for example) S_IFREG or with the following functions # The operators -f, -d, -l, -b, -c, -p, and -S. @@ -6100,6 +6727,8 @@ See your native chmod(2) and stat(2) documentation for more details about the C constants. To get status info for a symbolic link instead of the target file behind the link, use the C function. +Portability issues: L. + =item state EXPR X @@ -6109,13 +6738,14 @@ X =item state TYPE EXPR : ATTRS -C declares a lexically scoped variable, just like C does. +C declares a lexically scoped variable, just like C. However, those variables will never be reinitialized, contrary to lexical variables that are reinitialized each time their enclosing block is entered. -C variables are only enabled when the C pragma is -in effect. See L. +C variables are enabled only when the C pragma +is in effect, unless the keyword is written as C. +See L. =item study SCALAR X @@ -6125,13 +6755,12 @@ X Takes extra time to study SCALAR (C<$_> if unspecified) in anticipation of doing many pattern matches on the string before it is next modified. This may or may not save time, depending on the nature and number of -patterns you are searching on, and on the distribution of character -frequencies in the string to be searched--you probably want to compare -run times with and without it to see which runs faster. Those loops +patterns you are searching and the distribution of character +frequencies in the string to be searched; you probably want to compare +run times with and without it to see which is faster. Those loops that scan for many short constant strings (including the constant -parts of more complex patterns) will benefit most. You may have only -one C active at a time--if you study a different scalar the first -is "unstudied". (The way C works is this: a linked list of every +parts of more complex patterns) will benefit most. +(The way C works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the C<'k'> characters are. From each search string, the rarest character is selected, based on some static frequency tables @@ -6142,15 +6771,15 @@ For example, here is a loop that inserts index producing entries before any line containing a certain pattern: while (<>) { - study; - print ".IX foo\n" if /\bfoo\b/; - print ".IX bar\n" if /\bbar\b/; - print ".IX blurfl\n" if /\bblurfl\b/; - # ... - print; + study; + print ".IX foo\n" if /\bfoo\b/; + print ".IX bar\n" if /\bbar\b/; + print ".IX blurfl\n" if /\bblurfl\b/; + # ... + print; } -In searching for C, only those locations in C<$_> that contain C +In searching for C, only locations in C<$_> that contain C will be looked at, because C is rarer than C. In general, this is a big win except in pathological cases. The only question is whether it saves you more time than it took to build the linked list in the @@ -6159,22 +6788,22 @@ first place. Note that if you have to look for strings that you don't know till runtime, you can build an entire loop as a string and C that to avoid recompiling all your patterns all the time. Together with -undefining C<$/> to input entire files as one record, this can be very +undefining C<$/> to input entire files as one record, this can be quite fast, often faster than specialized programs like fgrep(1). The following scans a list of files (C<@files>) for a list of words (C<@words>), and prints out the names of those files that contain a match: $search = 'while (<>) { study;'; foreach $word (@words) { - $search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n"; + $search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n"; } $search .= "}"; @ARGV = @files; undef $/; - eval $search; # this screams - $/ = "\n"; # put back to normal input delimiter + eval $search; # this screams + $/ = "\n"; # put back to normal input delimiter foreach $file (sort keys(%seen)) { - print $file, "\n"; + print $file, "\n"; } =item sub NAME BLOCK @@ -6186,13 +6815,13 @@ X =item sub NAME (PROTO) : ATTRS BLOCK -This is subroutine definition, not a real function I. -Without a BLOCK it's just a forward declaration. Without a NAME, -it's an anonymous function declaration, and does actually return -a value: the CODE ref of the closure you just created. +This is subroutine definition, not a real function I. Without a +BLOCK it's just a forward declaration. Without a NAME, it's an anonymous +function declaration, so does return a value: the CODE ref of the closure +just created. See L and L for details about subroutines and -references, and L and L for more +references; see L and L for more information about attributes. =item substr EXPR,OFFSET,LENGTH,REPLACEMENT @@ -6203,37 +6832,37 @@ X X X X X =item substr EXPR,OFFSET Extracts a substring out of EXPR and returns it. First character is at -offset C<0>, or whatever you've set C<$[> to (but don't do that). +offset C<0> (or whatever you've set C<$[> to (but B<)). If OFFSET is negative (or more precisely, less than C<$[>), starts -that far from the end of the string. If LENGTH is omitted, returns -everything to the end of the string. If LENGTH is negative, leaves that +that far back from the end of the string. If LENGTH is omitted, returns +everything through the end of the string. If LENGTH is negative, leaves that many characters off the end of the string. my $s = "The black cat climbed the green tree"; - my $color = substr $s, 4, 5; # black - my $middle = substr $s, 4, -11; # black cat climbed the - my $end = substr $s, 14; # climbed the green tree - my $tail = substr $s, -4; # tree - my $z = substr $s, -4, 2; # tr + my $color = substr $s, 4, 5; # black + my $middle = substr $s, 4, -11; # black cat climbed the + my $end = substr $s, 14; # climbed the green tree + my $tail = substr $s, -4; # tree + my $z = substr $s, -4, 2; # tr You can use the substr() function as an lvalue, in which case EXPR must itself be an lvalue. If you assign something shorter than LENGTH, the string will shrink, and if you assign something longer than LENGTH, the string will grow to accommodate it. To keep the string the same -length you may need to pad or chop your value using C. +length, you may need to pad or chop your value using C. If OFFSET and LENGTH specify a substring that is partly outside the string, only the part within the string is returned. If the substring is beyond either end of the string, substr() returns the undefined value and produces a warning. When used as an lvalue, specifying a -substring that is entirely outside the string is a fatal error. +substring that is entirely outside the string raises an exception. Here's an example showing the behavior for boundary cases: my $name = 'fred'; - substr($name, 4) = 'dy'; # $name is now 'freddy' - my $null = substr $name, 6, 2; # returns '' (no warning) - my $oops = substr $name, 7; # returns undef, with warning - substr($name, 7) = 'gap'; # fatal error + substr($name, 4) = 'dy'; # $name is now 'freddy' + my $null = substr $name, 6, 2; # returns "" (no warning) + my $oops = substr $name, 7; # returns undef, with warning + substr($name, 7) = 'gap'; # raises an exception An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace @@ -6241,19 +6870,19 @@ parts of the EXPR and return what was there before in one operation, just as you can with splice(). my $s = "The black cat climbed the green tree"; - my $z = substr $s, 14, 7, "jumped from"; # climbed + my $z = substr $s, 14, 7, "jumped from"; # climbed # $s is now "The black cat jumped from the green tree" -Note that the lvalue returned by the 3-arg version of substr() acts as +Note that the lvalue returned by the three-argument version of substr() acts as a 'magic bullet'; each time it is assigned to, it remembers which part of the original string is being modified; for example: $x = '1234'; for (substr($x,1,2)) { - $_ = 'a'; print $x,"\n"; # prints 1a4 - $_ = 'xyz'; print $x,"\n"; # prints 1xyz4 + $_ = 'a'; print $x,"\n"; # prints 1a4 + $_ = 'xyz'; print $x,"\n"; # prints 1xyz4 $x = '56789'; - $_ = 'pq'; print $x,"\n"; # prints 5pq9 + $_ = 'pq'; print $x,"\n"; # prints 5pq9 } Prior to Perl version 5.9.1, the result of using an lvalue multiple times was @@ -6264,17 +6893,19 @@ X X X X Creates a new filename symbolically linked to the old filename. Returns C<1> for success, C<0> otherwise. On systems that don't support -symbolic links, produces a fatal error at run time. To check for that, +symbolic links, raises an exception. To check for that, use eval: $symlink_exists = eval { symlink("",""); 1 }; +Portability issues: L. + =item syscall NUMBER, LIST X X Calls the system call specified as the first element of the list, passing the remaining elements as arguments to the system call. If -unimplemented, produces a fatal error. The arguments are interpreted +unimplemented, raises an exception. The arguments are interpreted as follows: if a given argument is numeric, the argument is passed as an int. If not, the pointer to the string value is passed. You are responsible to make sure a string is pre-extended long enough to @@ -6286,39 +6917,41 @@ integer arguments are not literals and have never been interpreted in a numeric context, you may need to add C<0> to them to force them to look like numbers. This emulates the C function (or vice versa): - require 'syscall.ph'; # may need to run h2ph + require 'syscall.ph'; # may need to run h2ph $s = "hi there\n"; syscall(&SYS_write, fileno(STDOUT), $s, length $s); -Note that Perl supports passing of up to only 14 arguments to your system call, -which in practice should usually suffice. +Note that Perl supports passing of up to only 14 arguments to your syscall, +which in practice should (usually) suffice. Syscall returns whatever value returned by the system call it calls. If the system call fails, C returns C<-1> and sets C<$!> (errno). -Note that some system calls can legitimately return C<-1>. The proper -way to handle such calls is to assign C<$!=0;> before the call and -check the value of C<$!> if syscall returns C<-1>. +Note that some system calls I legitimately return C<-1>. The proper +way to handle such calls is to assign C<$!=0> before the call, then +check the value of C<$!> if C returns C<-1>. There's a problem with C: it returns the file -number of the read end of the pipe it creates. There is no way +number of the read end of the pipe it creates, but there is no way to retrieve the file number of the other end. You can avoid this problem by using C instead. +Portability issues: L. + =item sysopen FILEHANDLE,FILENAME,MODE X =item sysopen FILEHANDLE,FILENAME,MODE,PERMS -Opens the file whose filename is given by FILENAME, and associates it -with FILEHANDLE. If FILEHANDLE is an expression, its value is used as -the name of the real filehandle wanted. This function calls the -underlying operating system's C function with the parameters -FILENAME, MODE, PERMS. +Opens the file whose filename is given by FILENAME, and associates it with +FILEHANDLE. If FILEHANDLE is an expression, its value is used as the real +filehandle wanted; an undefined scalar will be suitably autovivified. This +function calls the underlying operating system's I(2) function with the +parameters FILENAME, MODE, and PERMS. The possible values and flag bits of the MODE parameter are -system-dependent; they are available via the standard module C. -See the documentation of your operating system's C to see which -values and flag bits are available. You may combine several flags +system-dependent; they are available via the standard module C. See +the documentation of your operating system's I(2) syscall to see +which values and flag bits are available. You may combine several flags using the C<|>-operator. Some of the most common values are C for opening the file in @@ -6327,7 +6960,7 @@ and C for opening the file in read-write mode. X X X For historical reasons, some values work on almost every system -supported by perl: zero means read-only, one means write-only, and two +supported by Perl: 0 means read-only, 1 means write-only, and 2 means read/write. We know that these values do I work under OS/390 & VM/ESA Unix and on the Macintosh; you probably don't want to use them in new code. @@ -6360,20 +6993,22 @@ Better to omit it. See the perlfunc(1) entry on C for more on this. Note that C depends on the fdopen() C library function. -On many UNIX systems, fdopen() is known to fail when file descriptors +On many Unix systems, fdopen() is known to fail when file descriptors exceed a certain value, typically 255. If you need more file descriptors than that, consider rebuilding Perl to use the C library, or perhaps using the POSIX::open() function. See L for a kinder, gentler explanation of opening files. +Portability issues: L. + =item sysread FILEHANDLE,SCALAR,LENGTH,OFFSET X =item sysread FILEHANDLE,SCALAR,LENGTH Attempts to read LENGTH bytes of data into variable SCALAR from the -specified FILEHANDLE, using the system call read(2). It bypasses +specified FILEHANDLE, using the read(2). It bypasses buffered IO, so mixing this with other kinds of reads, C, C, C, C, or C can cause confusion because the perlio or stdio layers usually buffers data. Returns the number of @@ -6390,7 +7025,7 @@ results in the string being padded to the required size with C<"\0"> bytes before the result of the read is appended. There is no syseof() function, which is ok, since eof() doesn't work -very well on device files (like ttys) anyway. Use sysread() and check +well on device files (like ttys) anyway. Use sysread() and check for a return value for 0 to decide whether you're done. Note that if the filehandle has been marked as C<:utf8> Unicode @@ -6402,20 +7037,19 @@ See L, L, and the C pragma, L. =item sysseek FILEHANDLE,POSITION,WHENCE X X -Sets FILEHANDLE's system position in bytes using the system call -lseek(2). FILEHANDLE may be an expression whose value gives the name -of the filehandle. The values for WHENCE are C<0> to set the new -position to POSITION, C<1> to set the it to the current position plus -POSITION, and C<2> to set it to EOF plus POSITION (typically -negative). +Sets FILEHANDLE's system position in bytes using lseek(2). FILEHANDLE may +be an expression whose value gives the name of the filehandle. The values +for WHENCE are C<0> to set the new position to POSITION; C<1> to set the it +to the current position plus POSITION; and C<2> to set it to EOF plus +POSITION, typically negative. Note the I: even if the filehandle has been set to operate on characters (for example by using the C<:encoding(utf8)> I/O layer), tell() will return byte offsets, not character offsets (because -implementing that would render sysseek() very slow). +implementing that would render sysseek() unacceptably slow). -sysseek() bypasses normal buffered IO, so mixing this with reads (other -than C, for example C<< <> >> or read()) C, C, +sysseek() bypasses normal buffered IO, so mixing it with reads other +than C (for example C<< <> >> or read()) C, C, C, C, or C may cause confusion. For WHENCE, you may also use the constants C, C, @@ -6423,8 +7057,8 @@ and C (start of the file, current position, end of the file) from the Fcntl module. Use of the constants is also more portable than relying on 0, 1, and 2. For example to define a "systell" function: - use Fcntl 'SEEK_CUR'; - sub systell { sysseek($_[0], 0, SEEK_CUR) } + use Fcntl 'SEEK_CUR'; + sub systell { sysseek($_[0], 0, SEEK_CUR) } Returns the new position, or the undefined value on failure. A position of zero is returned as the string C<"0 but true">; thus C returns @@ -6437,8 +7071,8 @@ X X =item system PROGRAM LIST Does exactly the same thing as C, except that a fork is -done first, and the parent process waits for the child process to -complete. Note that argument processing varies depending on the +done first and the parent process waits for the child process to +exit. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the @@ -6459,11 +7093,14 @@ of C on any open handles. The return value is the exit status of the program as returned by the C call. To get the actual exit value, shift right by eight (see below). See also L. This is I what you want to use to capture -the output from a command, for that you should use merely backticks or +the output from a command; for that you should use merely backticks or C, as described in L. Return value of -1 indicates a failure to start the program or an error of the wait(2) system call (inspect $! for the reason). +If you'd like to make C (and many other bits of Perl) die on error, +have a look at the L pragma. + Like C, C allows you to lie to a program about its name if you use the C syntax. Again, see L. @@ -6474,29 +7111,34 @@ value. @args = ("command", "arg1", "arg2"); system(@args) == 0 - or die "system @args failed: $?" + or die "system @args failed: $?" -You can check all the failure possibilities by inspecting -C<$?> like this: +If you'd like to manually inspect C's failure, you can check all +possible failure modes by inspecting C<$?> like this: if ($? == -1) { - print "failed to execute: $!\n"; + print "failed to execute: $!\n"; } elsif ($? & 127) { - printf "child died with signal %d, %s coredump\n", - ($? & 127), ($? & 128) ? 'with' : 'without'; + printf "child died with signal %d, %s coredump\n", + ($? & 127), ($? & 128) ? 'with' : 'without'; } else { - printf "child exited with value %d\n", $? >> 8; + printf "child exited with value %d\n", $? >> 8; } -Alternatively you might inspect the value of C<${^CHILD_ERROR_NATIVE}> -with the W*() calls of the POSIX extension. +Alternatively, you may inspect the value of C<${^CHILD_ERROR_NATIVE}> +with the C calls from the POSIX module. -When the arguments get executed via the system shell, results -and return codes will be subject to its quirks and capabilities. +When C's arguments are executed indirectly by the shell, +results and return codes are subject to its quirks. See L and L for details. +Since C does a C and C it may affect a C +handler. See L for details. + +Portability issues: L. + =item syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET X @@ -6505,25 +7147,27 @@ X =item syswrite FILEHANDLE,SCALAR Attempts to write LENGTH bytes of data from variable SCALAR to the -specified FILEHANDLE, using the system call write(2). If LENGTH is +specified FILEHANDLE, using write(2). If LENGTH is not specified, writes whole SCALAR. It bypasses buffered IO, so mixing this with reads (other than C, C, C, C, C, or C may cause confusion because the perlio and -stdio layers usually buffers data. Returns the number of bytes +stdio layers usually buffer data. Returns the number of bytes actually written, or C if there was an error (in this case the errno variable C<$!> is also set). If the LENGTH is greater than the -available data in the SCALAR after the OFFSET, only as much data as is +data available in the SCALAR after the OFFSET, only as much data as is available will be written. An OFFSET may be specified to write the data from some part of the string other than the beginning. A negative OFFSET specifies writing that many characters counting backwards from the end of the string. -In the case the SCALAR is empty you can use OFFSET but only zero offset. +If SCALAR is of length zero, you can only use an OFFSET of 0. -Note that if the filehandle has been marked as C<:utf8>, Unicode -characters are written instead of bytes (the LENGTH, OFFSET, and the -return value of syswrite() are in UTF-8 encoded Unicode characters). +B: If the filehandle is marked C<:utf8>, Unicode characters +encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and +return value of syswrite() are in (UTF8-encoded Unicode) characters. The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. +Alternately, if the handle is not marked with an encoding but you +attempt to write characters with code points over 255, raises an exception. See L, L, and the C pragma, L. =item tell FILEHANDLE @@ -6547,8 +7191,8 @@ tell() on pipes, fifos, and sockets usually returns -1. There is no C function. Use C for that. -Do not use tell() (or other buffered I/O operations) on a file handle -that has been manipulated by sysread(), syswrite() or sysseek(). +Do not use tell() (or other buffered I/O operations) on a filehandle +that has been manipulated by sysread(), syswrite(), or sysseek(). Those functions ignore the buffering, while tell() does not. =item telldir DIRHANDLE @@ -6580,7 +7224,7 @@ C function to iterate over such. Example: use NDBM_File; tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0); while (($key,$val) = each %HIST) { - print $key, ' = ', unpack('L',$val), "\n"; + print $key, ' = ', unpack('L',$val), "\n"; } untie(%HIST); @@ -6615,7 +7259,7 @@ A class implementing an ordinary array should have the following methods: DESTROY this UNTIE this -A class implementing a file handle should have the following methods: +A class implementing a filehandle should have the following methods: TIEHANDLE classname, LIST READ this, scalar, length, offset @@ -6645,8 +7289,8 @@ A class implementing a scalar should have the following methods: Not all methods indicated above need be implemented. See L, L, L, L, and L. -Unlike C, the C function will not use or require a module -for you--you need to do that explicitly yourself. See L +Unlike C, the C function will not C or C a module +for you; you need to do that explicitly yourself. See L or the F module for interesting C implementations. For further details see L, L<"tied VARIABLE">. @@ -6668,11 +7312,10 @@ C. On most systems the epoch is 00:00:00 UTC, January 1, 1970; a prominent exception being Mac OS Classic which uses 00:00:00, January 1, 1904 in the current local time zone for its epoch. -For measuring time in better granularity than one second, -you may use either the L module (from CPAN, and starting from -Perl 5.8 part of the standard distribution), or if you have -gettimeofday(2), you may be able to use the C interface of Perl. -See L for details. +For measuring time in better granularity than one second, use the +L module from Perl 5.8 onwards (or from CPAN before then), or, +if you have gettimeofday(2), you may be able to use the C +interface of Perl. See L for details. For date and time processing look at the many related modules on CPAN. For a comprehensive date and time representation look at the @@ -6681,14 +7324,16 @@ L module. =item times X -Returns a four-element list giving the user and system times, in -seconds, for this process and the children of this process. +Returns a four-element list giving the user and system times in +seconds for this process and any exited children of this process. ($user,$system,$cuser,$csystem) = times; In scalar context, C returns C<$user>. -Note that times for children are included only after they terminate. +Children's times are only included for terminated children. + +Portability issues: L. =item tr/// @@ -6701,15 +7346,16 @@ X =item truncate EXPR,LENGTH Truncates the file opened on FILEHANDLE, or named by EXPR, to the -specified length. Produces a fatal error if truncate isn't implemented -on your system. Returns true if successful, the undefined value -otherwise. +specified length. Raises an exception if truncate isn't implemented +on your system. Returns true if successful, C on error. The behavior is undefined if LENGTH is greater than the length of the file. The position in the file of FILEHANDLE is left unchanged. You may want to -call L before writing to the file. +call L before writing to the file. + +Portability issues: L. =item uc EXPR X X X @@ -6717,14 +7363,15 @@ X X X =item uc Returns an uppercased version of EXPR. This is the internal function -implementing the C<\U> escape in double-quoted strings. Respects -current LC_CTYPE locale if C in force. See L -and L for more details about locale and Unicode support. +implementing the C<\U> escape in double-quoted strings. It does not attempt to do titlecase mapping on initial letters. See -C for that. +L for that. If EXPR is omitted, uses C<$_>. +This function behaves the same way under various pragma, such as in a locale, +as L does. + =item ucfirst EXPR X X @@ -6732,12 +7379,13 @@ X X Returns the value of EXPR with the first character in uppercase (titlecase in Unicode). This is the internal function implementing -the C<\u> escape in double-quoted strings. Respects current LC_CTYPE -locale if C in force. See L and L -for more details about locale and Unicode support. +the C<\u> escape in double-quoted strings. If EXPR is omitted, uses C<$_>. +This function behaves the same way under various pragma, such as in a locale, +as L does. + =item umask EXPR X @@ -6752,11 +7400,11 @@ and isn't one of the digits). The C value is such a number representing disabled permissions bits. The permission (or "mode") values you pass C or C are modified by your umask, so even if you tell C to create a file with permissions C<0777>, -if your umask is C<0022> then the file will actually be created with +if your umask is C<0022>, then the file will actually be created with permissions C<0755>. If your C were C<0027> (group can't write; others can't read, write, or execute), then passing -C C<0666> would create a file with mode C<0640> (C<0666 &~ -027> is C<0640>). +C C<0666> would create a file with mode C<0640> (because +C<0666 &~ 027> is C<0640>). Here's some advice: supply a creation mode of C<0666> for regular files (in C) and one of C<0777> for directories (in @@ -6769,13 +7417,15 @@ kept private: mail files, web browser cookies, I<.rhosts> files, and so on. If umask(2) is not implemented on your system and you are trying to -restrict access for I (i.e., (EXPR & 0700) > 0), produces a -fatal error at run time. If umask(2) is not implemented and you are +restrict access for I (i.e., C<< (EXPR & 0700) > 0 >>), +raises an exception. If umask(2) is not implemented and you are not trying to restrict access for yourself, returns C. Remember that a umask is a number, usually given in octal; it is I a string of octal digits. See also L, if all you have is a string. +Portability issues: L. + =item undef EXPR X X @@ -6783,12 +7433,12 @@ X X Undefines the value of EXPR, which must be an lvalue. Use only on a scalar value, an array (using C<@>), a hash (using C<%>), a subroutine -(using C<&>), or a typeglob (using C<*>). (Saying C +(using C<&>), or a typeglob (using C<*>). Saying C will probably not do what you expect on most predefined variables or -DBM list values, so don't do that; see L.) Always returns the +DBM list values, so don't do that; see L. Always returns the undefined value. You can omit the EXPR, in which case nothing is undefined, but you still get an undefined value that you could, for -instance, return from a subroutine, assign to a variable or pass as a +instance, return from a subroutine, assign to a variable, or pass as a parameter. Examples: undef $foo; @@ -6808,20 +7458,29 @@ X X X X X =item unlink -Deletes a list of files. Returns the number of files successfully -deleted. +Deletes a list of files. On success, it returns the number of files +it successfully deleted. On failure, it returns false and sets C<$!> +(errno): - $cnt = unlink 'a', 'b', 'c'; + my $unlinked = unlink 'a', 'b', 'c'; unlink @goners; - unlink <*.bak>; + unlink glob "*.bak"; + +On error, C will not tell you which files it could not remove. +If you want to know which files you could not remove, try them one +at a time: + + foreach my $file ( @goners ) { + unlink $file or warn "Could not unlink $file: $!"; + } -Note: C will not attempt to delete directories unless you are superuser -and the B<-U> flag is supplied to Perl. Even if these conditions are -met, be warned that unlinking a directory can inflict damage on your -filesystem. Finally, using C on directories is not supported on -many operating systems. Use C instead. +Note: C will not attempt to delete directories unless you are +superuser and the B<-U> flag is supplied to Perl. Even if these +conditions are met, be warned that unlinking a directory can inflict +damage on your filesystem. Finally, using C on directories is +not supported on many operating systems. Use C instead. -If LIST is omitted, uses C<$_>. +If LIST is omitted, C uses C<$_>. =item unpack TEMPLATE,EXPR X @@ -6833,6 +7492,7 @@ and expands it out into a list of values. (In scalar context, it returns merely the first value produced.) If EXPR is omitted, unpacks the C<$_> string. +See L for an introduction to this function. The string is broken into chunks described by the TEMPLATE. Each chunk is converted separately to a value. Typically, either the string is a result @@ -6843,8 +7503,8 @@ The TEMPLATE has the same format as in the C function. Here's a subroutine that does substring: sub substr { - my($what,$where,$howmuch) = @_; - unpack("x$where a$howmuch", $what); + my($what,$where,$howmuch) = @_; + unpack("x$where a$howmuch", $what); } and then there's @@ -6856,14 +7516,14 @@ a % to indicate that you want a -bit checksum of the items instead of the items themselves. Default is a 16-bit checksum. Checksum is calculated by summing numeric values of expanded values (for string fields the sum of -C is taken, for bit fields the sum of zeroes and ones). +C is taken; for bit fields the sum of zeroes and ones). For example, the following computes the same number as the System V sum program: $checksum = do { - local $/; # slurp! - unpack("%32W*",<>) % 65535; + local $/; # slurp! + unpack("%32W*",<>) % 65535; }; The following efficiently counts the number of set bits in a bit vector: @@ -6877,25 +7537,28 @@ not known to be valid is likely to have disastrous consequences. If there are more pack codes or if the repeat count of a field or a group is larger than what the remainder of the input string allows, the result -is not well defined: in some cases, the repeat count is decreased, or -C will produce null strings or zeroes, or terminate with an -error. If the input string is longer than one described by the TEMPLATE, -the rest is ignored. +is not well defined: the repeat count may be decreased, or +C may produce empty strings or zeros, or it may raise an exception. +If the input string is longer than one described by the TEMPLATE, +the remainder of that input string is ignored. See L for more examples and notes. =item untie VARIABLE X -Breaks the binding between a variable and a package. (See C.) +Breaks the binding between a variable and a package. +(See L.) Has no effect if the variable is not tied. =item unshift ARRAY,LIST X +=item unshift EXPR,LIST + Does the opposite of a C. Or the opposite of a C, depending on how you look at it. Prepends list to the front of the -array, and returns the new number of elements in the array. +array and returns the new number of elements in the array. unshift(@ARGV, '-e') unless $ARGV[0] =~ /^-/; @@ -6903,6 +7566,11 @@ Note the LIST is prepended whole, not one element at a time, so the prepended elements stay in the same order. Use C to do the reverse. +Starting with Perl 5.14, C can take a scalar EXPR, which must hold +a reference to an unblessed array. The argument will be dereferenced +automatically. This aspect of C is considered highly +experimental. The exact behaviour may change in a future version of Perl. + =item use Module VERSION LIST X X X @@ -6921,39 +7589,40 @@ package. It is exactly equivalent to BEGIN { require Module; Module->import( LIST ); } except that Module I be a bareword. +The importation can be made conditional; see L. -In the peculiar C form, VERSION may be either a numeric -argument such as 5.006, which will be compared to C<$]>, or a literal of -the form v5.6.1, which will be compared to C<$^V> (aka $PERL_VERSION). A -fatal error is produced if VERSION is greater than the version of the +In the peculiar C form, VERSION may be either a positive +decimal fraction such as 5.006, which will be compared to C<$]>, or a v-string +of the form v5.6.1, which will be compared to C<$^V> (aka $PERL_VERSION). An +exception is raised if VERSION is greater than the version of the current Perl interpreter; Perl will not attempt to parse the rest of the file. Compare with L, which can do a similar check at run time. Symmetrically, C allows you to specify that you want a version -of perl older than the specified one. +of Perl older than the specified one. Specifying VERSION as a literal of the form v5.6.1 should generally be avoided, because it leads to misleading error messages under earlier versions of Perl (that is, prior to 5.6.0) that do not support this syntax. The equivalent numeric version should be used instead. - use v5.6.1; # compile time version check - use 5.6.1; # ditto - use 5.006_001; # ditto; preferred for backwards compatibility + use v5.6.1; # compile time version check + use 5.6.1; # ditto + use 5.006_001; # ditto; preferred for backwards compatibility This is often useful if you need to check the current Perl version before Cing library modules that won't work with older versions of Perl. (We try not to do this more than we have to.) -Also, if the specified perl version is greater than or equal to 5.9.5, +Also, if the specified Perl version is greater than or equal to 5.9.5, C will also load the C pragma and enable all features available in the requested version. See L. -Similarly, if the specified perl version is greater than or equal to +Similarly, if the specified Perl version is greater than or equal to 5.11.0, strictures are enabled lexically as with C (except that the F file is not actually loaded). The C forces the C and C to happen at compile time. The C makes sure the module is loaded into memory if it hasn't been -yet. The C is not a builtin--it's just an ordinary static method +yet. The C is not a builtin; it's just an ordinary static method call into the C package to tell the module to import the list of features back into the current package. The module can implement its C method any way it likes, though most modules just choose to @@ -6998,24 +7667,38 @@ block scope (like C or C, unlike ordinary modules, which import symbols into the current package (which are effective through the end of the file). -There's a corresponding C command that unimports meanings imported +Because C takes effect at compile time, it doesn't respect the +ordinary flow control of the code being compiled. In particular, putting +a C inside the false branch of a conditional doesn't prevent it +from being processed. If a module or pragma only needs to be loaded +conditionally, this can be done using the L pragma: + + use if $] < 5.008, "utf8"; + use if WANT_WARNINGS, warnings => qw(all); + +There's a corresponding C declaration that unimports meanings imported by C, i.e., it calls C instead of C. -It behaves exactly as C does with respect to VERSION, an -omitted LIST, empty LIST, or no unimport method being found. +It behaves just as C does with VERSION, an omitted or empty LIST, +or no unimport method being found. no integer; no strict 'refs'; no warnings; +Care should be taken when using the C form of C. It is +I meant to be used to assert that the running Perl is of a earlier +version than its argument and I to undo the feature-enabling side effects +of C. + See L for a list of standard modules and pragmas. See L -for the C<-M> and C<-m> command-line options to perl that give C +for the C<-M> and C<-m> command-line options to Perl that give C functionality from the command-line. =item utime LIST X Changes the access and modification times on each file of a list of -files. The first two elements of the list must be the NUMERICAL access +files. The first two elements of the list must be the NUMERIC access and modification times, in that order. Returns the number of files successfully changed. The inode change time of each file is set to the current time. For example, this code has the same effect as the @@ -7026,14 +7709,17 @@ the user running the program: $atime = $mtime = time; utime $atime, $mtime, @ARGV; -Since perl 5.7.2, if the first two elements of the list are C, then -the utime(2) function in the C library will be called with a null second +Since Perl 5.7.2, if the first two elements of the list are C, +the utime(2) syscall from your C library is called with a null second argument. On most systems, this will set the file's access and -modification times to the current time (i.e. equivalent to the example -above) and will even work on other users' files where you have write +modification times to the current time (i.e., equivalent to the example +above) and will work even on files you don't own provided you have write permission: - utime undef, undef, @ARGV; + for $file (@ARGV) { + utime(undef, undef, $file) + || warn "couldn't touch $file: $!"; + } Under NFS this will use the time of the NFS server, not the time of the local machine. If there is a time synchronization problem, the @@ -7041,55 +7727,65 @@ NFS server and local machine will have different times. The Unix touch(1) command will in fact normally use this form instead of the one shown in the first example. -Note that only passing one of the first two elements as C will -be equivalent of passing it as 0 and will not have the same effect as -described when they are both C. This case will also trigger an +Passing only one of the first two elements as C is +equivalent to passing a 0 and will not have the effect +described when both are C. This also triggers an uninitialized warning. -On systems that support futimes, you might pass file handles among the -files. On systems that don't support futimes, passing file handles -produces a fatal error at run time. The file handles must be passed -as globs or references to be recognized. Barewords are considered -file names. +On systems that support futimes(2), you may pass filehandles among the +files. On systems that don't support futimes(2), passing filehandles raises +an exception. Filehandles must be passed as globs or glob references to be +recognized; barewords are considered filenames. + +Portability issues: L. =item values HASH X =item values ARRAY +=item values EXPR + Returns a list consisting of all the values of the named hash, or the values -of an array. (In a scalar context, returns the number of values.) +of an array. (In scalar context, returns the number of values.) The values are returned in an apparently random order. The actual -random order is subject to change in future versions of perl, but it +random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the C or C function would produce on the same (unmodified) hash. Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see L). As a side effect, calling values() resets the HASH or ARRAY's internal -iterator, +iterator; see L. (In particular, calling values() in void context resets the iterator with no other overhead. Apart from resetting the iterator, -C in list context is no different to plain C<@array>. +C in list context is the same as plain C<@array>. We recommend that you use void context C for this, but reasoned that it taking C out would require more documentation than leaving it in.) - Note that the values are not copied, which means modifying them will modify the contents of the hash: - for (values %hash) { s/foo/bar/g } # modifies %hash values + for (values %hash) { s/foo/bar/g } # modifies %hash values for (@hash{keys %hash}) { s/foo/bar/g } # same +Starting with Perl 5.14, C can take a scalar EXPR, which must hold +a reference to an unblessed hash or array. The argument will be +dereferenced automatically. This aspect of C is considered highly +experimental. The exact behaviour may change in a future version of Perl. + + for (values $hashref) { ... } + for (values $obj->get_arrayref) { ... } + See also C, C, and C. =item vec EXPR,OFFSET,BITS X X X Treats the string in EXPR as a bit vector made up of elements of -width BITS, and returns the value of the element specified by OFFSET +width BITS and returns the value of the element specified by OFFSET as an unsigned integer. BITS therefore specifies the number of bits that are reserved for each element in the bit vector. This must be a power of two from 1 to 32 (or 64, if your platform supports @@ -7117,7 +7813,7 @@ to give the expression the correct precedence as in If the selected element is outside the string, the value 0 is returned. If an element off the end of the string is written to, Perl will first extend the string with sufficiently many zero bytes. It is an error -to try to write off the beginning of the string (i.e. negative OFFSET). +to try to write off the beginning of the string (i.e., negative OFFSET). If the string happens to be encoded as UTF-8 internally (and thus has the UTF8 flag set), this is ignored by C, and it operates on the @@ -7134,22 +7830,22 @@ The comments show the string after each step. Note that this code works in the same way on big-endian or little-endian machines. my $foo = ''; - vec($foo, 0, 32) = 0x5065726C; # 'Perl' + vec($foo, 0, 32) = 0x5065726C; # 'Perl' # $foo eq "Perl" eq "\x50\x65\x72\x6C", 32 bits - print vec($foo, 0, 8); # prints 80 == 0x50 == ord('P') - - vec($foo, 2, 16) = 0x5065; # 'PerlPe' - vec($foo, 3, 16) = 0x726C; # 'PerlPerl' - vec($foo, 8, 8) = 0x50; # 'PerlPerlP' - vec($foo, 9, 8) = 0x65; # 'PerlPerlPe' - vec($foo, 20, 4) = 2; # 'PerlPerlPe' . "\x02" - vec($foo, 21, 4) = 7; # 'PerlPerlPer' - # 'r' is "\x72" - vec($foo, 45, 2) = 3; # 'PerlPerlPer' . "\x0c" - vec($foo, 93, 1) = 1; # 'PerlPerlPer' . "\x2c" - vec($foo, 94, 1) = 1; # 'PerlPerlPerl' - # 'l' is "\x6c" + print vec($foo, 0, 8); # prints 80 == 0x50 == ord('P') + + vec($foo, 2, 16) = 0x5065; # 'PerlPe' + vec($foo, 3, 16) = 0x726C; # 'PerlPerl' + vec($foo, 8, 8) = 0x50; # 'PerlPerlP' + vec($foo, 9, 8) = 0x65; # 'PerlPerlPe' + vec($foo, 20, 4) = 2; # 'PerlPerlPe' . "\x02" + vec($foo, 21, 4) = 7; # 'PerlPerlPer' + # 'r' is "\x72" + vec($foo, 45, 2) = 3; # 'PerlPerlPer' . "\x0c" + vec($foo, 93, 1) = 1; # 'PerlPerlPer' . "\x2c" + vec($foo, 94, 1) = 1; # 'PerlPerlPerl' + # 'l' is "\x6c" To transform a bit vector into a string or list of 0's and 1's, use these: @@ -7188,8 +7884,8 @@ Here is an example to illustrate how the bits actually fall in place: . __END__ -Regardless of the machine architecture on which it is run, the above -example should print the following table: +Regardless of the machine architecture on which it runs, the +example above should print the following table: 0 1 2 3 unpack("V",$_) 01234567890123456789012345678901 @@ -7326,13 +8022,18 @@ example should print the following table: =item wait X -Behaves like the wait(2) system call on your system: it waits for a child +Behaves like wait(2) on your system: it waits for a child process to terminate and returns the pid of the deceased process, or C<-1> if there are no child processes. The status is returned in C<$?> and C<${^CHILD_ERROR_NATIVE}>. Note that a return value of C<-1> could mean that child processes are being automatically reaped, as described in L. +If you use wait in your handler for $SIG{CHLD} it may accidentally for the +child created by qx() or system(). See L for details. + +Portability issues: L. + =item waitpid PID,FLAGS X @@ -7344,12 +8045,12 @@ The status is returned in C<$?> and C<${^CHILD_ERROR_NATIVE}>. If you say use POSIX ":sys_wait_h"; #... do { - $kid = waitpid(-1, WNOHANG); + $kid = waitpid(-1, WNOHANG); } while $kid > 0; then you can do a non-blocking wait for all pending zombie processes. Non-blocking wait is available on machines supporting either the -waitpid(2) or wait4(2) system calls. However, waiting for a particular +waitpid(2) or wait4(2) syscalls. However, waiting for a particular pid with FLAGS of C<0> is implemented everywhere. (Perl emulates the system call by remembering the status values of processes that have exited but have not been harvested by the Perl script yet.) @@ -7358,6 +8059,8 @@ Note that on some systems, a return value of C<-1> could mean that child processes are being automatically reaped. See L for details, and for other examples. +Portability issues: L. + =item wantarray X X @@ -7366,7 +8069,7 @@ C is looking for a list value. Returns false if the context is looking for a scalar. Returns the undefined value if the context is looking for no value (void context). - return unless defined wantarray; # don't bother doing more + return unless defined wantarray; # don't bother doing more my @a = complex_calculation(); return wantarray ? @a : "@a"; @@ -7383,7 +8086,7 @@ Prints the value of LIST to STDERR. If the last element of LIST does not end in a newline, it appends the same file/line number text as C does. -If LIST is empty and C<$@> already contains a value (typically from a +If the output is empty and C<$@> already contains a value (typically from a previous eval) that value is used after appending C<"\t...caught"> to C<$@>. This is useful for staying almost, but not entirely similar to C. @@ -7393,7 +8096,7 @@ If C<$@> is empty then the string C<"Warning: Something's wrong"> is used. No message is printed if there is a C<$SIG{__WARN__}> handler installed. It is the handler's responsibility to deal with the message as it sees fit (like, for instance, converting it into a C). Most -handlers must therefore make arrangements to actually display the +handlers must therefore arrange to actually display the warnings that they are not prepared to deal with, by calling C again in the handler. Note that this is quite safe and will not produce an endless loop, since C<__WARN__> hooks are not called from @@ -7417,10 +8120,57 @@ warnings (even the so-called mandatory ones). An example: # run-time warnings enabled after here warn "\$foo is alive and $foo!"; # does show up -See L for details on setting C<%SIG> entries, and for more +See L for details on setting C<%SIG> entries and for more examples. See the Carp module for other kinds of warnings using its carp() and cluck() functions. +=item when EXPR BLOCK +X + +=item when BLOCK + +C is analogous to the C keyword in other languages. Used with a +C loop or the experimental C block, C can be used in +Perl to implement C/C like statements. Available as a +statement after Perl 5.10 and as a statement modifier after 5.14. +Here are three examples: + + use v5.10; + foreach (@fruits) { + when (/apples?/) { + say "I like apples." + } + when (/oranges?/) { + say "I don't like oranges." + } + default { + say "I don't like anything" + } + } + + # require 5.14 for when as statement modifier + use v5.14; + foreach (@fruits) { + say "I like apples." when /apples?/; + say "I don't like oranges." when /oranges?; + default { say "I don't like anything" } + } + + use v5.10; + given ($fruit) { + when (/apples?/) { + say "I like apples." + } + when (/oranges?/) { + say "I don't like oranges." + } + default { + say "I don't like anything" + } + } + +See L for detailed information. + =item write FILEHANDLE X @@ -7434,15 +8184,15 @@ a file is the one having the same name as the filehandle, but the format for the current output channel (see the C

the pack function will gobble up -that many values from the LIST. A C<*> for the repeat count means to -use however many items are left, except for C<@>, C, C, where it -is equivalent to C<0>, for <.> where it means relative to string start -and C, where it is equivalent to 1 (or 45, which is the same). -A numeric repeat count may optionally be enclosed in brackets, as in -C. - -One can replace the numeric repeat count by a template enclosed in brackets; -then the packed length of this template in bytes is used as a count. -For example, C skips a long (it skips the number of bytes in a long); -the template C<$t X[$t] $t> unpack()s twice what $t unpacks. -If the template in brackets contains alignment commands (such as C), -its packed length is calculated as if the start of the template has the maximal -possible alignment. - -When used with C, C<*> results in the addition of a trailing null -byte (so the packed result will be one longer than the byte C -of the item). +Each letter may optionally be followed by a number indicating the repeat +count. A numeric repeat count may optionally be enclosed in brackets, as +in C. The repeat count gobbles that many values from +the LIST when used with all format types other than C, C, C, C, +C, C, C, C<@>, C<.>, C, C, and C