X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/ca8d723e9504508322389fed1274da4bbaed2dfb..12f3ad4ebe4097bd8c213e744ff27acdf4cbdc2d:/pod/perlfunc.pod diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 82a80de..0d9a65c 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -14,34 +14,34 @@ take more than one argument. Thus, a comma terminates the argument of a unary operator, but merely separates the arguments of a list operator. A unary operator generally provides a scalar context to its argument, while a list operator may provide either scalar or list -contexts for its arguments. If it does both, the scalar arguments will -be first, and the list argument will follow. (Note that there can ever -be only one such list argument.) For instance, splice() has three scalar +contexts for its arguments. If it does both, scalar arguments +come first and list argument follow, and there can only ever +be one such list argument. For instance, splice() has three scalar arguments followed by a list, whereas gethostbyname() has four scalar arguments. In the syntax descriptions that follow, list operators that expect a -list (and provide list context for the elements of the list) are shown +list (and provide list context for elements of the list) are shown with LIST as an argument. Such a list may consist of any combination of scalar arguments or list values; the list values will be included in the list as if each individual element were interpolated at that point in the list, forming a longer single-dimensional list value. -Commas should separate elements of the LIST. +Commas should separate literal elements of the LIST. Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the -parentheses.) If you use the parentheses, the simple (but occasionally -surprising) rule is this: It I like a function, therefore it I a +parentheses.) If you use parentheses, the simple but occasionally +surprising rule is this: It I like a function, therefore it I a function, and precedence doesn't matter. Otherwise it's a list -operator or unary operator, and precedence does matter. And whitespace -between the function and left parenthesis doesn't count--so you need to -be careful sometimes: +operator or unary operator, and precedence does matter. Whitespace +between the function and left parenthesis doesn't count, so sometimes +you need to be careful: - print 1+2+4; # Prints 7. - print(1+2) + 4; # Prints 3. - print (1+2)+4; # Also prints 3! - print +(1+2)+4; # Prints 7. - print ((1+2)+4); # Prints 7. + print 1+2+4; # Prints 7. + print(1+2) + 4; # Prints 3. + print (1+2)+4; # Also prints 3! + print +(1+2)+4; # Prints 7. + print ((1+2)+4); # Prints 7. If you run Perl with the B<-w> switch it can warn you about this. For example, the third line above produces: @@ -57,12 +57,12 @@ C. For functions that can be used in either a scalar or list context, nonabortive failure is generally indicated in a scalar context by returning the undefined value, and in a list context by returning the -null list. +empty list. Remember the following important rule: There is B that relates the behavior of an expression in list context to its behavior in scalar context, or vice versa. It might do two totally different things. -Each operator and function decides which sort of value it would be most +Each operator and function decides which sort of value would be most appropriate to return in scalar context. Some operators return the length of the list that would have been returned in list context. Some operators return the first value in the list. Some operators return the @@ -78,14 +78,22 @@ the context at compile time. It would generate the scalar comma operator there, not the list construction version of the comma. That means it was never a list to start with. -In general, functions in Perl that serve as wrappers for system calls -of the same name (like chown(2), fork(2), closedir(2), etc.) all return +In general, functions in Perl that serve as wrappers for system calls ("syscalls") +of the same name (like chown(2), fork(2), closedir(2), etc.) return true when they succeed and C otherwise, as is usually mentioned in the descriptions below. This is different from the C interfaces, -which return C<-1> on failure. Exceptions to this rule are C, +which return C<-1> on failure. Exceptions to this rule include C, C, and C. System calls also set the special C<$!> variable on failure. Other functions do not, except accidentally. +Extension modules can also hook into the Perl parser to define new +kinds of keyword-headed expression. These may look like functions, but +may also look completely different. The syntax following the keyword +is defined entirely by the extension. If you are an implementor, see +L for the mechanism. If you are using such +a module, see the module's documentation for details of the syntax that +it defines. + =head2 Perl Functions by Category X @@ -117,7 +125,7 @@ C, C, C =item Functions for real @ARRAYs X -C, C, C, C, C +C, C, C, C, C, C, C, C =item Functions for list data X @@ -138,7 +146,7 @@ C, C, C, C, C, C and low-level POSIX tty-handling operations. If FILEHANDLE is an expression, the value is taken as an indirect filehandle, generally its name. @@ -1877,27 +1951,23 @@ You can use this to find out whether two handles refer to the same underlying descriptor: if (fileno(THIS) == fileno(THAT)) { - print "THIS and THAT are dups\n"; + print "THIS and THAT are dups\n"; } -(Filehandles connected to memory objects via new features of C may -return undefined even though they are open.) - - =item flock FILEHANDLE,OPERATION X X X Calls flock(2), or an emulation of it, on FILEHANDLE. Returns true for success, false on failure. Produces a fatal error if used on a machine that doesn't implement flock(2), fcntl(2) locking, or lockf(3). -C is Perl's portable file locking interface, although it locks -only entire files, not records. +C is Perl's portable file-locking interface, although it locks +entire files only, not records. Two potentially non-obvious but traditional C semantics are that it waits indefinitely until the lock is granted, and that its locks -B. Such discretionary locks are more flexible, but offer -fewer guarantees. This means that programs that do not also use C -may modify files locked with C. See L, +are B. Such discretionary locks are more flexible, but +offer fewer guarantees. This means that programs that do not also use +C may modify files locked with C. See L, your port's specific documentation, or your system-specific local manpages for details. It's best to assume traditional behavior if you're writing portable programs. (But if you're not, you should as always feel perfectly @@ -1911,8 +1981,8 @@ you can use the symbolic names if you import them from the Fcntl module, either individually, or as a group using the ':flock' tag. LOCK_SH requests a shared lock, LOCK_EX requests an exclusive lock, and LOCK_UN releases a previously requested lock. If LOCK_NB is bitwise-or'ed with -LOCK_SH or LOCK_EX then C will return immediately rather than blocking -waiting for the lock (check the return status to see if you got it). +LOCK_SH or LOCK_EX then C returns immediately rather than blocking +waiting for the lock; check the return status to see if you got it. To avoid the possibility of miscoordination, Perl now flushes FILEHANDLE before locking or unlocking it. @@ -1932,35 +2002,35 @@ network; you would need to use the more system-specific C for that. If you like you can force Perl to ignore your system's flock(2) function, and so provide its own fcntl(2)-based emulation, by passing the switch C<-Ud_flock> to the F program when you configure -perl. +Perl. Here's a mailbox appender for BSD systems. use Fcntl qw(:flock SEEK_END); # import LOCK_* and SEEK_END constants sub lock { - my ($fh) = @_; - flock($fh, LOCK_EX) or die "Cannot lock mailbox - $!\n"; + my ($fh) = @_; + flock($fh, LOCK_EX) or die "Cannot lock mailbox - $!\n"; - # and, in case someone appended while we were waiting... - seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n"; + # and, in case someone appended while we were waiting... + seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n"; } sub unlock { - my ($fh) = @_; - flock($fh, LOCK_UN) or die "Cannot unlock mailbox - $!\n"; + my ($fh) = @_; + flock($fh, LOCK_UN) or die "Cannot unlock mailbox - $!\n"; } open(my $mbox, ">>", "/usr/spool/mail/$ENV{'USER'}") - or die "Can't open mailbox: $!"; + or die "Can't open mailbox: $!"; lock($mbox); print $mbox $msg,"\n\n"; unlock($mbox); -On systems that support a real flock(), locks are inherited across fork() -calls, whereas those that must resort to the more capricious fcntl() -function lose the locks, making it harder to write servers. +On systems that support a real flock(2), locks are inherited across fork() +calls, whereas those that must resort to the more capricious fcntl(2) +function lose their locks, making it seriously harder to write servers. See also L for other flock() examples. @@ -1976,11 +2046,11 @@ fork(), great care has gone into making it extremely efficient (for example, using copy-on-write technology on data pages), making it the dominant paradigm for multitasking over the last few decades. -Beginning with v5.6.0, Perl will attempt to flush all files opened for +Beginning with v5.6.0, Perl attempts to flush all files opened for output before forking the child process, but this may not be supported on some platforms (see L). To be safe, you may need to set C<$|> ($AUTOFLUSH in English) or call the C method of -C on any open handles in order to avoid duplicate output. +C on any open handles to avoid duplicate output. If you C without ever waiting on your children, you will accumulate zombies. On some systems, you can avoid this by setting @@ -2000,8 +2070,8 @@ Declare a picture format for use by the C function. For example: format Something = - Test: @<<<<<<<< @||||| @>>>>> - $str, $%, '$' . int($num) + Test: @<<<<<<<< @||||| @>>>>> + $str, $%, '$' . int($num) . $str = "widget"; @@ -2023,40 +2093,44 @@ C<$^A> are written to some filehandle. You could also read C<$^A> and then set C<$^A> back to C<"">. Note that a format typically does one C per line of form, but the C function itself doesn't care how many newlines are embedded in the PICTURE. This means -that the C<~> and C<~~> tokens will treat the entire PICTURE as a single line. +that the C<~> and C<~~> tokens treat the entire PICTURE as a single line. You may therefore need to use multiple formlines to implement a single -record format, just like the format compiler. +record format, just like the C compiler. Be careful if you put double quotes around the picture, because an C<@> character may be taken to mean the beginning of an array name. C always returns true. See L for other examples. +If you are trying to use this instead of C to capture the output, +you may find it easier to open a filehandle to a scalar +(C<< open $fh, ">", \$output >>) and write to that instead. + =item getc FILEHANDLE X X X X =item getc Returns the next character from the input file attached to FILEHANDLE, -or the undefined value at end of file, or if there was an error (in +or the undefined value at end of file or if there was an error (in the latter case C<$!> is set). If FILEHANDLE is omitted, reads from STDIN. This is not particularly efficient. However, it cannot be used by itself to fetch single characters without waiting for the user to hit enter. For that, try something more like: if ($BSD_STYLE) { - system "stty cbreak /dev/tty 2>&1"; + system "stty cbreak /dev/tty 2>&1"; } else { - system "stty", '-icanon', 'eol', "\001"; + system "stty", '-icanon', 'eol', "\001"; } $key = getc(STDIN); if ($BSD_STYLE) { - system "stty -cbreak /dev/tty 2>&1"; + system "stty -cbreak /dev/tty 2>&1"; } else { - system "stty", 'icanon', 'eol', '^@'; # ASCII null + system 'stty', 'icanon', 'eol', '^@'; # ASCII NUL } print "\n"; @@ -2065,15 +2139,15 @@ is left as an exercise to the reader. The C function can do this more portably on systems purporting POSIX compliance. See also the C -module from your nearest CPAN site; details on CPAN can be found on +module from your nearest CPAN site; details on CPAN can be found under L. =item getlogin X X This implements the C library function of the same name, which on most -systems returns the current login from F, if any. If null, -use C. +systems returns the current login from F, if any. If it +returns the empty string, use C. $login = getlogin || getpwuid($<) || "Kilroy"; @@ -2083,7 +2157,8 @@ secure as C. =item getpeername SOCKET X X -Returns the packed sockaddr address of other end of the SOCKET connection. +Returns the packed sockaddr address of the other end of the SOCKET +connection. use Socket; $hersockaddr = getpeername(SOCK); @@ -2097,8 +2172,8 @@ X X Returns the current process group for the specified PID. Use a PID of C<0> to get the current process group for the current process. Will raise an exception if used on a machine that -doesn't implement getpgrp(2). If PID is omitted, returns process -group of current process. Note that the POSIX version of C +doesn't implement getpgrp(2). If PID is omitted, returns the process +group of the current process. Note that the POSIX version of C does not accept a PID argument, so only C is truly portable. =item getppid @@ -2108,7 +2183,7 @@ Returns the process id of the parent process. Note for Linux users: on Linux, the C functions C and C return different values from different threads. In order to -be portable, this behavior is not reflected by the perl-level function +be portable, this behavior is not reflected by the Perl-level function C, that returns a consistent value across threads. If you want to call the underlying C, you may use the CPAN module C. @@ -2117,7 +2192,7 @@ C. X X X Returns the current priority for a process, a process group, or a user. -(See L.) Will raise a fatal exception if used on a +(See C.) Will raise a fatal exception if used on a machine that doesn't implement getpriority(2). =item getpwnam NAME @@ -2186,8 +2261,8 @@ X X X =item endservent -These routines perform the same functions as their counterparts in the -system library. In list context, the return values from the +These routines are the same as their counterparts in the +system C library. In list context, the return values from the various get routines are as follows: ($name,$passwd,$uid,$gid, @@ -2198,7 +2273,7 @@ various get routines are as follows: ($name,$aliases,$proto) = getproto* ($name,$aliases,$port,$proto) = getserv* -(If the entry doesn't exist you get a null list.) +(If the entry doesn't exist you get an empty list.) The exact meaning of the $gcos field varies but it usually contains the real name of the user (as opposed to the login name) and other @@ -2206,7 +2281,7 @@ information pertaining to the user. Beware, however, that in many system users are able to change this information and therefore it cannot be trusted and therefore the $gcos is tainted (see L). The $passwd and $shell, user's encrypted password and -login shell, are also tainted, because of the same reason. +login shell, are also tainted, for the same reason. In scalar context, you get the name, unless the function was a lookup by name, in which case you get the other thing, whatever it is. @@ -2221,7 +2296,7 @@ lookup by name, in which case you get the other thing, whatever it is. #etc. In I the fields $quota, $comment, and $expire are special -cases in the sense that in many systems they are unsupported. If the +in that they are unsupported on many systems. If the $quota is unsupported, it is an empty scalar. If it is supported, it usually encodes the disk quota. If the $comment field is unsupported, it is an empty scalar. If it is supported it usually encodes some @@ -2235,21 +2310,21 @@ F file. You can also find out from within Perl what your $quota and $comment fields mean and whether you have the $expire field by using the C module and the values C, C, C, C, and C. Shadow password -files are only supported if your vendor has implemented them in the +files are supported only if your vendor has implemented them in the intuitive fashion that calling the regular C library routines gets the shadow versions if you're running under privilege or if there exists the shadow(3) functions as found in System V (this includes Solaris -and Linux.) Those systems that implement a proprietary shadow password +and Linux). Those systems that implement a proprietary shadow password facility are unlikely to be supported. -The $members value returned by I is a space separated list of +The $members value returned by I is a space-separated list of the login names of the members of the group. For the I functions, if the C variable is supported in C, it will be returned to you via C<$?> if the function call fails. The -C<@addrs> value returned by a successful call is a list of the raw -addresses returned by the corresponding system library call. In the -Internet domain, each address is four bytes long and you can unpack it +C<@addrs> value returned by a successful call is a list of raw +addresses returned by the corresponding library call. In the +Internet domain, each address is four bytes long; you can unpack it by saying something like: ($a,$b,$c,$d) = unpack('W4',$addr[0]); @@ -2287,7 +2362,7 @@ for each field. For example: use User::pwent; $is_his = (stat($filename)->uid == pwent($whoever)->uid); -Even though it looks like they're the same method calls (uid), +Even though it looks as though they're the same method calls (uid), they aren't, because a C object is different from a C object. @@ -2315,24 +2390,24 @@ C module) will exist. To query options at another level the protocol number of the appropriate protocol controlling the option should be supplied. For example, to indicate that an option is to be interpreted by the TCP protocol, LEVEL should be set to the protocol -number of TCP, which you can get using getprotobyname. +number of TCP, which you can get using C. -The call returns a packed string representing the requested socket option, -or C if there is an error (the error reason will be in $!). What -exactly is in the packed string depends in the LEVEL and OPTNAME, consult -your system documentation for details. A very common case however is that -the option is an integer, in which case the result will be a packed -integer which you can decode using unpack with the C (or C) format. +The function returns a packed string representing the requested socket +option, or C on error, with the reason for the error placed in +C<$!>. Just what is in the packed string depends on LEVEL and OPTNAME; +consult getsockopt(2) for details. A common case is that the option is an +integer, in which case the result is a packed integer, which you can decode +using C with the C (or C) format. -An example testing if Nagle's algorithm is turned on on a socket: +An example to test whether Nagle's algorithm is turned on on a socket: use Socket qw(:all); defined(my $tcp = getprotobyname("tcp")) - or die "Could not determine the protocol number for tcp"; + or die "Could not determine the protocol number for tcp"; # my $tcp = IPPROTO_TCP; # Alternative my $packed = getsockopt($socket, $tcp, TCP_NODELAY) - or die "Could not query TCP_NODELAY socket option: $!"; + or die "getsockopt TCP_NODELAY: $!"; my $nodelay = unpack("I", $packed); print "Nagle's algorithm is turned ", $nodelay ? "off\n" : "on\n"; @@ -2350,10 +2425,17 @@ implementing the C<< <*.c> >> operator, but you can use it directly. If EXPR is omitted, C<$_> is used. The C<< <*.c> >> operator is discussed in more detail in L. -Note that C will split its arguments on whitespace, treating -each segment as separate pattern. As such, C would -match all files with a F<.c> or F<.h> extension. The expression -C would match all files in the current working directory. +Note that C splits its arguments on whitespace and treats +each segment as separate pattern. As such, C +matches all files with a F<.c> or F<.h> extension. The expression +C matches all files in the current working directory. + +If non-empty braces are the only wildcard characters used in the +C, no filenames are matched, but potentially many strings +are returned. For example, this produces nine strings, one for +each pairing of fruits and colors: + + @many = glob "{apple,tomato,cherry}={green,yellow,red}"; Beginning with v5.6.0, this operator is implemented using the standard C extension. See L for details, including @@ -2367,8 +2449,8 @@ X X X Works just like L but the returned values are localized for the standard Greenwich time zone. -Note: when called in list context, $isdst, the last value -returned by gmtime is always C<0>. There is no +Note: When called in list context, $isdst, the last value +returned by gmtime, is always C<0>. There is no Daylight Saving Time in GMT. See L for portability concerns. @@ -2380,18 +2462,15 @@ X X X =item goto &NAME -The C form finds the statement labeled with LABEL and resumes -execution there. It may not be used to go into any construct that -requires initialization, such as a subroutine or a C loop. It -also can't be used to go into a construct that is optimized away, -or to get out of a block or subroutine given to C. -It can be used to go almost anywhere else within the dynamic scope, -including out of subroutines, but it's usually better to use some other -construct such as C or C. The author of Perl has never felt the -need to use this form of C (in Perl, that is--C is another matter). -(The difference being that C does not offer named loops combined with -loop control. Perl does, and this replaces most structured uses of C -in other languages.) +The C form finds the statement labeled with LABEL and +resumes execution there. It can't be used to get out of a block or +subroutine given to C. It can be used to go almost anywhere +else within the dynamic scope, including out of subroutines, but it's +usually better to use some other construct such as C or C. +The author of Perl has never felt the need to use this form of C +(in Perl, that is; C is another matter). (The difference is that C +does not offer named loops combined with loop control. Perl does, and +this replaces most structured uses of C in other languages.) The C form expects a label name, whose scope will be resolved dynamically. This allows for computed Cs per FORTRAN, but isn't @@ -2399,6 +2478,16 @@ necessarily recommended if you're optimizing for maintainability: goto ("FOO", "BAR", "GLARCH")[$i]; +As shown in this example, C is exempt from the "looks like a +function" rule. A pair of parentheses following it does not (necessarily) +delimit its argument. C is equivalent to C. + +Use of C or C to jump into a construct is +deprecated and will issue a warning. Even then, it may not be used to +go into any construct that requires initialization, such as a +subroutine or a C loop. It also can't be used to go into a +construct that is optimized away. + The C form is quite different from the other forms of C. In fact, it isn't a goto in the normal sense at all, and doesn't have the stigma associated with other gotos. Instead, it @@ -2445,7 +2534,7 @@ This is usually something to be avoided when writing clear code. If C<$_> is lexical in the scope where the C appears (because it has been declared with C) then, in addition to being locally aliased to -the list elements, C<$_> keeps being lexical inside the block; i.e. it +the list elements, C<$_> keeps being lexical inside the block; i.e., it can't be seen from the outside, avoiding any potential side-effects. See also L for a list composed of the results of the BLOCK or EXPR. @@ -2497,7 +2586,7 @@ X X X X X Returns the integer portion of EXPR. If EXPR is omitted, uses C<$_>. You should not use this function for rounding: one because it truncates -towards C<0>, and two because machine representations of floating point +towards C<0>, and two because machine representations of floating-point numbers can sometimes produce counterintuitive results. For example, C produces -268 rather than the correct -269; that's because it's really more like -268.99999999999994315658 instead. Usually, @@ -2509,14 +2598,14 @@ X Implements the ioctl(2) function. You'll probably first have to say - require "sys/ioctl.ph"; # probably in $Config{archlib}/sys/ioctl.ph + require "sys/ioctl.ph"; # probably in $Config{archlib}/sys/ioctl.ph to get the correct function definitions. If F doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as F<< >>. (There is a Perl script called B that comes with the Perl kit that may help you in this, but it's nontrivial.) SCALAR will be read and/or -written depending on the FUNCTION--a pointer to the string value of SCALAR +written depending on the FUNCTION; a C pointer to the string value of SCALAR will be passed as the third argument of the actual C call. (If SCALAR has no string value but does have a numeric value, that value will be passed rather than a pointer to the string value. To guarantee this to be @@ -2526,10 +2615,10 @@ C. The return value of C (and C) is as follows: - if OS returns: then Perl returns: - -1 undefined value - 0 string "0 but true" - anything else that number + if OS returns: then Perl returns: + -1 undefined value + 0 string "0 but true" + anything else that number Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating @@ -2552,19 +2641,19 @@ separated by the value of EXPR, and returns that new string. Example: Beware that unlike C, C doesn't take a pattern as its first argument. Compare L. -=item keys HASH +=item keys HASH (or HASHREF) X X -=item keys ARRAY +=item keys ARRAY (or ARRAYREF) Returns a list consisting of all the keys of the named hash, or the indices of an array. (In scalar context, returns the number of keys or indices.) The keys of a hash are returned in an apparently random order. The actual -random order is subject to change in future versions of perl, but it +random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the C or C function produces (given that the hash has not been modified). Since -Perl 5.8.1 the ordering is different even between different runs of +Perl 5.8.1 the ordering can be different even between different runs of Perl for security reasons (see L). @@ -2577,13 +2666,13 @@ Here is yet another way to print your environment: @keys = keys %ENV; @values = values %ENV; while (@keys) { - print pop(@keys), '=', pop(@values), "\n"; + print pop(@keys), '=', pop(@values), "\n"; } or how about sorted by key: foreach $key (sort(keys %ENV)) { - print $key, '=', $ENV{$key}, "\n"; + print $key, '=', $ENV{$key}, "\n"; } The returned values are copies of the original keys in the hash, so @@ -2593,10 +2682,10 @@ To sort a hash by value, you'll need to use a C function. Here's a descending numeric sort of a hash by its values: foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { - printf "%4d %s\n", $hash{$key}, $key; + printf "%4d %s\n", $hash{$key}, $key; } -As an lvalue C allows you to increase the number of hash buckets +Used as an lvalue, C allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say @@ -2612,6 +2701,17 @@ C in this way (but you needn't worry about doing this by accident, as trying has no effect). C in an lvalue context is a syntax error. +When given a reference to a hash or array, the argument will be +dereferenced automatically. + + for (keys $hashref) { ... } + for (keys $obj->get_arrayref) { ... } + +If the reference is a blessed object that overrides either C<%{}> or +C<@{}>, the override will be used instead of dereferencing the underlying +variable type. If both overrides are provided, C<%{}> will be the default. +If this is not desired, you must dereference the argument yourself. + See also C, C and C. =item kill SIGNAL, LIST @@ -2624,18 +2724,20 @@ same as the number actually killed). $cnt = kill 1, $child1, $child2; kill 9, @goners; -If SIGNAL is zero, no signal is sent to the process, but the kill(2) -system call will check whether it's possible to send a signal to it (that +If SIGNAL is zero, no signal is sent to the process, but C +checks whether it's I to send a signal to it (that means, to be brief, that the process is owned by the same user, or we are -the super-user). This is a useful way to check that a child process is +the super-user). This is useful to check that a child process is still alive (even if only as a zombie) and hasn't changed its UID. See L for notes on the portability of this construct. -Unlike in the shell, if SIGNAL is negative, it kills -process groups instead of processes. (On System V, a negative I -number will also kill process groups, but that's not portable.) That -means you usually want to use positive not negative signals. You may also -use a signal name in quotes. +Unlike in the shell, if SIGNAL is negative, it kills process groups instead +of processes. That means you usually want to use positive not negative signals. +You may also use a signal name in quotes. + +The behavior of kill when a I number is zero or negative depends on +the operating system. For example, on POSIX-conforming systems, zero will +signal the current process group and -1 will signal all processes. See L for more details. @@ -2650,11 +2752,11 @@ omitted, the command refers to the innermost enclosing loop. The C block, if any, is not executed: LINE: while () { - last LINE if /^$/; # exit when done with header - #... + last LINE if /^$/; # exit when done with header + #... } -C cannot be used to exit a block which returns a value such as +C cannot be used to exit a block that returns a value such as C, C or C, and should not be used to exit a grep() or map() operation. @@ -2671,12 +2773,62 @@ X X =item lc Returns a lowercased version of EXPR. This is the internal function -implementing the C<\L> escape in double-quoted strings. Respects -current LC_CTYPE locale if C in force. See L -and L for more details about locale and Unicode support. +implementing the C<\L> escape in double-quoted strings. If EXPR is omitted, uses C<$_>. +What gets returned depends on several factors: + +=over + +=item If C is in effect: + +=over + +=item On EBCDIC platforms + +The results are what the C language system call C returns. + +=item On ASCII platforms + +The results follow ASCII semantics. Only characters C change, to C +respectively. + +=back + +=item Otherwise, If EXPR has the UTF8 flag set + +If the current package has a subroutine named C, it will be used to +change the case +(See L.) +Otherwise Unicode semantics are used for the case change. + +=item Otherwise, if C is in effect + +Respects current LC_CTYPE locale. See L. + +=item Otherwise, if C is in effect: + +Unicode semantics are used for the case change. Any subroutine named +C will be ignored. + +=item Otherwise: + +=over + +=item On EBCDIC platforms + +The results are what the C language system call C returns. + +=item On ASCII platforms + +ASCII semantics are used for the case change. The lowercase of any character +outside the ASCII range is the character itself. + +=back + +=back + =item lcfirst EXPR X X @@ -2684,30 +2836,30 @@ X X Returns the value of EXPR with the first character lowercased. This is the internal function implementing the C<\l> escape in -double-quoted strings. Respects current LC_CTYPE locale if C in force. See L and L for more -details about locale and Unicode support. +double-quoted strings. If EXPR is omitted, uses C<$_>. +This function behaves the same way under various pragmata, such as in a locale, +as L does. + =item length EXPR X X =item length Returns the length in I of the value of EXPR. If EXPR is -omitted, returns length of C<$_>. If EXPR is undefined, returns C. -Note that this cannot be used on an entire array or hash to find out how -many elements these have. For that, use C and C respectively. - -Note the I: if the EXPR is in Unicode, you will get the -number of characters, not the number of bytes. To get the length -of the internal string in bytes, use C, see -L. Note that the internal encoding is variable, and the number -of bytes usually meaningless. To get the number of bytes that the -string would have when encoded as UTF-8, use -C. +omitted, returns the length of C<$_>. If EXPR is undefined, returns +C. + +This function cannot be used on an entire array or hash to find out how +many elements these have. For that, use C and C, respectively. + +Like all Perl character operations, length() normally deals in logical +characters, not physical bytes. For how many bytes a string encoded as +UTF-8 would take up, use C (you'll have +to C first). See L and L. =item link OLDFILE,NEWFILE X @@ -2718,7 +2870,7 @@ success, false otherwise. =item listen SOCKET,QUEUESIZE X -Does the same thing that the listen system call does. Returns true if +Does the same thing that the listen(2) system call does. Returns true if it succeeded, false otherwise. See the example in L. @@ -2734,6 +2886,10 @@ block, file, or eval. If more than one value is listed, the list must be placed in parentheses. See L for details, including issues with tied arrays and hashes. +The C construct can also be used to localize the deletion +of array/hash elements to the current block. +See L. + =item localtime EXPR X X @@ -2761,7 +2917,7 @@ This makes it easy to get a month name from a list: C<$year> is the number of years since 1900, not just the last two digits of the year. That is, C<$year> is C<123> in year 2023. The proper way -to get a complete 4-digit year is simply: +to get a 4-digit year is simply: $year += 1900; @@ -2786,13 +2942,13 @@ In scalar context, C returns the ctime(3) value: $now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994" -This scalar value is B locale dependent but is a Perl builtin. For GMT +This scalar value is B locale-dependent but is a Perl builtin. For GMT instead of local time use the L builtin. See also the -C module (to convert the second, minutes, hours, ... back to +C module (to convert the seconds, minutes, hours, ... back to the integer value returned by time()), and the L module's strftime(3) and mktime(3) functions. -To get somewhat similar but locale dependent date strings, set up your +To get somewhat similar but locale-dependent date strings, set up your locale environment variables appropriately (please see L) and try for example: @@ -2806,7 +2962,7 @@ and the month of the year, may not necessarily be three characters wide. See L for portability concerns. -The L and L modules provides a convenient, +The L and L modules provide a convenient, by-name access mechanism to the gmtime() and localtime() functions, respectively. @@ -2816,13 +2972,13 @@ L module on CPAN. =item lock THING X -This function places an advisory lock on a shared variable, or referenced +This function places an advisory lock on a shared variable or referenced object contained in I until the lock goes out of scope. lock() is a "weak keyword" : this means that if you've defined a function by this name (before any calls to it), that function will be called -instead. (However, if you've said C, lock() is always a -keyword.) See L. +instead. If you are not under C this does nothing. +See L. =item log EXPR X X X X X @@ -2830,13 +2986,14 @@ X X X X X =item log Returns the natural logarithm (base I) of EXPR. If EXPR is omitted, -returns log of C<$_>. To get the log of another base, use basic algebra: +returns the log of C<$_>. To get the +log of another base, use basic algebra: The base-N log of a number is equal to the natural log of that number divided by the natural log of N. For example: sub log10 { - my $n = shift; - return log($n)/log(10); + my $n = shift; + return log($n)/log(10); } See also L for the inverse operation. @@ -2870,9 +3027,27 @@ total number of elements so generated. Evaluates BLOCK or EXPR in list context, so each element of LIST may produce zero, one, or more elements in the returned value. - @chars = map(chr, @nums); + @chars = map(chr, @numbers); + +translates a list of numbers to the corresponding characters. + + my @squares = map { $_ * $_ } @numbers; + +translates a list of numbers to their squared values. + + my @squares = map { $_ > 5 ? ($_ * $_) : () } @numbers; + +shows that number of returned elements can differ from the number of +input elements. To omit an element, return an empty list (). +This could also be achieved by writing + + my @squares = map { $_ * $_ } grep { $_ > 5 } @numbers; -translates a list of numbers to the corresponding characters. And +which makes the intention more clear. + +Map always returns a list, which can be +assigned to a hash such that the elements +become key/value pairs. See L for more details. %hash = map { get_a_key_for($_) => $_ } @array; @@ -2880,7 +3055,7 @@ is just a funny way to write %hash = (); foreach (@array) { - $hash{get_a_key_for($_)} = $_; + $hash{get_a_key_for($_)} = $_; } Note that C<$_> is an alias to the list value, so it can be used to @@ -2896,27 +3071,27 @@ the list elements, C<$_> keeps being lexical inside the block; that is, it can't be seen from the outside, avoiding any potential side-effects. C<{> starts both hash references and blocks, so C could be either -the start of map BLOCK LIST or map EXPR, LIST. Because perl doesn't look -ahead for the closing C<}> it has to take a guess at which its dealing with -based what it finds just after the C<{>. Usually it gets it right, but if it +the start of map BLOCK LIST or map EXPR, LIST. Because Perl doesn't look +ahead for the closing C<}> it has to take a guess at which it's dealing with +based on what it finds just after the C<{>. Usually it gets it right, but if it doesn't it won't realize something is wrong until it gets to the C<}> and encounters the missing (or unexpected) comma. The syntax error will be -reported close to the C<}> but you'll need to change something near the C<{> -such as using a unary C<+> to give perl some help: +reported close to the C<}>, but you'll need to change something near the C<{> +such as using a unary C<+> to give Perl some help: - %hash = map { "\L$_", 1 } @array # perl guesses EXPR. wrong - %hash = map { +"\L$_", 1 } @array # perl guesses BLOCK. right - %hash = map { ("\L$_", 1) } @array # this also works - %hash = map { lc($_), 1 } @array # as does this. - %hash = map +( lc($_), 1 ), @array # this is EXPR and works! + %hash = map { "\L$_" => 1 } @array # perl guesses EXPR. wrong + %hash = map { +"\L$_" => 1 } @array # perl guesses BLOCK. right + %hash = map { ("\L$_" => 1) } @array # this also works + %hash = map { lc($_) => 1 } @array # as does this. + %hash = map +( lc($_) => 1 ), @array # this is EXPR and works! - %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array) + %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array) or to force an anon hash constructor use C<+{>: - @hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at end + @hashes = map +{ lc($_) => 1 }, @array # EXPR, so needs comma at end -and you get list of anonymous hashes each with only 1 entry. +to get a list of anonymous hashes each with only one entry apiece. =item mkdir FILENAME,MASK X X X @@ -2931,7 +3106,7 @@ returns true, otherwise it returns false and sets C<$!> (errno). If omitted, MASK defaults to 0777. If omitted, FILENAME defaults to C<$_>. -In general, it is better to create directories with permissive MASK, +In general, it is better to create directories with a permissive MASK, and let the user modify that with their C, than it is to supply a restrictive MASK and give the user no way to be more permissive. The exceptions to this rule are when the file or directory should be @@ -2943,7 +3118,7 @@ number of trailing slashes. Some operating and filesystems do not get this right, so Perl automatically removes all trailing slashes to keep everyone happy. -In order to recursively create a directory structure look at +To recursively create a directory structure, look at the C function of the L module. =item msgctl ID,CMD,ARG @@ -2957,14 +3132,16 @@ first to get the correct constant definitions. If CMD is C, then ARG must be a variable that will hold the returned C structure. Returns like C: the undefined value for error, C<"0 but true"> for zero, or the actual return value otherwise. See also -L, C, and C documentation. +L and the documentation for C and +C. =item msgget KEY,FLAGS X Calls the System V IPC function msgget(2). Returns the message queue id, or the undefined value if there is an error. See also -L and C and C documentation. +L and the documentation for C and +C. =item msgrcv ID,VAR,SIZE,TYPE,FLAGS X @@ -2975,8 +3152,8 @@ SIZE. Note that when a message is received, the message type as a native long integer will be the first thing in VAR, followed by the actual message. This packing may be opened with C. Taints the variable. Returns true if successful, or false if there is -an error. See also L, C, and -C documentation. +an error. See also L and the documentation for +C and C. =item msgsnd ID,MSG,FLAGS X @@ -2986,7 +3163,7 @@ message queue ID. MSG must begin with the native long integer message type, and be followed by the length of the actual message, and finally the message itself. This kind of packing can be achieved with C. Returns true if successful, -or false if there is an error. See also C +or false if there is an error. See also the C and C documentation. =item my EXPR @@ -3003,7 +3180,7 @@ enclosing block, file, or C. If more than one value is listed, the list must be placed in parentheses. The exact semantics and interface of TYPE and ATTRS are still -evolving. TYPE is currently bound to the use of C pragma, +evolving. TYPE is currently bound to the use of the C pragma, and attributes are handled using the C pragma, or starting from Perl 5.8.0 also via the C module. See L for details, and L, @@ -3018,12 +3195,12 @@ The C command is like the C statement in C; it starts the next iteration of the loop: LINE: while () { - next LINE if /^#/; # discard comments - #... + next LINE if /^#/; # discard comments + #... } Note that if there were a C block on the above, it would get -executed even on discarded lines. If the LABEL is omitted, the command +executed even on discarded lines. If LABEL is omitted, the command refers to the innermost enclosing loop. C cannot be used to exit a block which returns a value such as @@ -3036,14 +3213,15 @@ that executes once. Thus C will exit such a block early. See also L for an illustration of how C, C, and C work. -=item no Module VERSION LIST -X +=item no MODULE VERSION LIST +X +X -=item no Module VERSION +=item no MODULE VERSION -=item no Module LIST +=item no MODULE LIST -=item no Module +=item no MODULE =item no VERSION @@ -3058,21 +3236,25 @@ Interprets EXPR as an octal string and returns the corresponding value. (If EXPR happens to start off with C<0x>, interprets it as a hex string. If EXPR starts off with C<0b>, it is interpreted as a binary string. Leading whitespace is ignored in all three cases.) -The following will handle decimal, binary, octal, and hex in the standard -Perl or C notation: +The following will handle decimal, binary, octal, and hex in standard +Perl notation: $val = oct($val) if $val =~ /^0/; If EXPR is omitted, uses C<$_>. To go the other way (produce a number in octal), use sprintf() or printf(): - $perms = (stat("filename"))[2] & 07777; - $oct_perms = sprintf "%lo", $perms; + $dec_perms = (stat("filename"))[2] & 07777; + $oct_perm_str = sprintf "%o", $perms; The oct() function is commonly used when a string such as C<644> needs -to be converted into a file mode, for example. (Although perl will -automatically convert strings into numbers as needed, this automatic -conversion assumes base 10.) +to be converted into a file mode, for example. Although Perl +automatically converts strings into numbers as needed, this automatic +conversion assumes base 10. + +Leading white space is ignored without warning, as too are any trailing +non-digits, such as a decimal point (C only handles non-negative +integers, not negative integers or floating point). =item open FILEHANDLE,EXPR X X X X @@ -3111,70 +3293,63 @@ declared with C--will not work for this purpose; so if you're using C, specify EXPR in your call to open.) If three or more arguments are specified then the mode of opening and -the file name are separate. If MODE is C<< '<' >> or nothing, the file +the filename are separate. If MODE is C<< '<' >> or nothing, the file is opened for input. If MODE is C<< '>' >>, the file is truncated and opened for output, being created if necessary. If MODE is C<<< '>>' >>>, the file is opened for appending, again being created if necessary. You can put a C<'+'> in front of the C<< '>' >> or C<< '<' >> to indicate that you want both read and write access to the file; thus -C<< '+<' >> is almost always preferred for read/write updates--the C<< -'+>' >> mode would clobber the file first. You can't usually use +C<< '+<' >> is almost always preferred for read/write updates--the +C<< '+>' >> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have -variable length records. See the B<-i> switch in L for a +variable-length records. See the B<-i> switch in L for a better approach. The file is created with permissions of C<0666> -modified by the process' C value. +modified by the process's C value. These various prefixes correspond to the fopen(3) modes of C<'r'>, C<'r+'>, C<'w'>, C<'w+'>, C<'a'>, and C<'a+'>. -In the 2-arguments (and 1-argument) form of the call the mode and -filename should be concatenated (in this order), possibly separated by -spaces. It is possible to omit the mode in these forms if the mode is +In the two-argument (and one-argument) form of the call, the mode and +filename should be concatenated (in that order), possibly separated by +spaces. You may omit the mode in these forms when that mode is C<< '<' >>. -If the filename begins with C<'|'>, the filename is interpreted as a -command to which output is to be piped, and if the filename ends with a -C<'|'>, the filename is interpreted as a command which pipes output to -us. See L -for more examples of this. (You are not allowed to C to a command -that pipes both in I out, but see L, L, -and L -for alternatives.) - For three or more arguments if MODE is C<'|-'>, the filename is interpreted as a command to which output is to be piped, and if MODE -is C<'-|'>, the filename is interpreted as a command which pipes -output to us. In the 2-arguments (and 1-argument) form one should +is C<'-|'>, the filename is interpreted as a command that pipes +output to us. In the two-argument (and one-argument) form, one should replace dash (C<'-'>) with the command. See L for more examples of this. (You are not allowed to C to a command that pipes both in I out, but see L, L, and -L for alternatives.) +L for +alternatives.) -In the three-or-more argument form of pipe opens, if LIST is specified +In the form of pipe opens taking three or more arguments, if LIST is specified (extra arguments after the command name) then LIST becomes arguments to the command invoked if the platform supports it. The meaning of C with more than three arguments for non-pipe modes is not yet -specified. Experimental "layers" may give extra LIST arguments +defined, but experimental "layers" may give extra LIST arguments meaning. -In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN -and opening C<< '>-' >> opens STDOUT. +In the two-argument (and one-argument) form, opening C<< '<-' >> +or C<'-'> opens STDIN and opening C<< '>-' >> opens STDOUT. -You may use the three-argument form of open to specify IO "layers" -(sometimes also referred to as "disciplines") to be applied to the handle +You may use the three-argument form of open to specify I/O layers +(sometimes referred to as "disciplines") to apply to the handle that affect how the input and output are processed (see L and -L for more details). For example +L for more details). For example: - open(my $fh, "<:encoding(UTF-8)", "file") + open(my $fh, "<:encoding(UTF-8)", "filename") + || die "can't open UTF-8 encoded filename: $!"; -will open the UTF-8 encoded file containing Unicode characters, +opens the UTF-8 encoded file containing Unicode characters; see L. Note that if layers are specified in the -three-arg form then default layers stored in ${^OPEN} (see L; +three-argument form, then default layers stored in ${^OPEN} (see L; usually set by the B pragma or the switch B<-CioD>) are ignored. -Open returns nonzero upon success, the undefined value otherwise. If +Open returns nonzero on success, the undefined value otherwise. If the C involved a pipe, the return value happens to be the pid of the subprocess. @@ -3182,17 +3357,16 @@ If you're running Perl on a system that distinguishes between text files and binary files, then you should check out L for tips for dealing with this. The key distinction between systems that need C and those that don't is their text file formats. Systems -like Unix, Mac OS, and Plan 9, which delimit lines with a single -character, and which encode that character in C as C<"\n">, do not +like Unix, Mac OS, and Plan 9, that end lines with a single +character and encode that character in C as C<"\n"> do not need C. The rest need it. -When opening a file, it's usually a bad idea to continue normal execution -if the request failed, so C is frequently used in connection with +When opening a file, it's seldom a good idea to continue +if the request failed, so C is frequently used with C. Even if C won't do what you want (say, in a CGI script, -where you want to make a nicely formatted error message (but there are -modules that can help with that problem)) you should always check -the return value from opening a file. The infrequent exception is when -working with an unopened filehandle is actually what you want to do. +where you want to format a suitable error message (but there are +modules that can help with that problem)) always check +the return value from opening a file. As a special case the 3-arg form with a read/write mode and the third argument being C: @@ -3204,69 +3378,68 @@ works for symmetry, but you really should consider writing something to the temporary file first. You will need to seek() to do the reading. -Since v5.8.0, perl has built using PerlIO by default. Unless you've -changed this (i.e. Configure -Uuseperlio), you can open file handles to -"in memory" files held in Perl scalars via: +Since v5.8.0, Perl has built using PerlIO by default. Unless you've +changed this (i.e., Configure -Uuseperlio), you can open filehandles +directly to Perl scalars via: open($fh, '>', \$variable) || .. -Though if you try to re-open C or C as an "in memory" -file, you have to close it first: +To (re)open C or C as an in-memory file, close it first: close STDOUT; open STDOUT, '>', \$variable or die "Can't open STDOUT: $!"; -Examples: +General examples: $ARTICLE = 100; open ARTICLE or die "Can't find article $ARTICLE: $!\n"; while (
) {... - open(LOG, '>>/usr/spool/news/twitlog'); # (log is reserved) + open(LOG, '>>/usr/spool/news/twitlog'); # (log is reserved) # if the open fails, output is discarded - open(my $dbase, '+<', 'dbase.mine') # open for update - or die "Can't open 'dbase.mine' for update: $!"; + open(my $dbase, '+<', 'dbase.mine') # open for update + or die "Can't open 'dbase.mine' for update: $!"; - open(my $dbase, '+Tmp$$") # $$ is our process id - or die "Can't start sort: $!"; + open(EXTRACT, "|sort >Tmp$$") # $$ is our process id + or die "Can't start sort: $!"; - # in memory files + # in-memory files open(MEMORY,'>', \$var) - or die "Can't open memory file: $!"; - print MEMORY "foo!\n"; # output will end up in $var + or die "Can't open memory file: $!"; + print MEMORY "foo!\n"; # output will appear in $var # process argument list of files along with any includes foreach $file (@ARGV) { - process($file, 'fh00'); + process($file, 'fh00'); } sub process { - my($filename, $input) = @_; - $input++; # this is a string increment - unless (open($input, $filename)) { - print STDERR "Can't open $filename: $!\n"; - return; - } - - local $_; - while (<$input>) { # note use of indirection - if (/^#include "(.*)"/) { - process($1, $input); - next; - } - #... # whatever - } + my($filename, $input) = @_; + $input++; # this is a string increment + unless (open($input, $filename)) { + print STDERR "Can't open $filename: $!\n"; + return; + } + + local $_; + while (<$input>) { # note use of indirection + if (/^#include "(.*)"/) { + process($1, $input); + next; + } + #... # whatever + } } See L for detailed info on PerlIO. @@ -3274,7 +3447,7 @@ See L for detailed info on PerlIO. You may also, in the Bourne shell tradition, specify an EXPR beginning with C<< '>&' >>, in which case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric) to be -duped (as L) and opened. You may use C<&> after C<< > >>, +duped (as C) and opened. You may use C<&> after C<< > >>, C<<< >> >>>, C<< < >>, C<< +> >>, C<<< +>> >>>, and C<< +< >>. The mode you specify should match the mode of the original filehandle. (Duping a filehandle does not take into account any existing contents @@ -3291,11 +3464,11 @@ C using various methods: open STDOUT, '>', "foo.out" or die "Can't redirect STDOUT: $!"; open STDERR, ">&STDOUT" or die "Can't dup STDOUT: $!"; - select STDERR; $| = 1; # make unbuffered - select STDOUT; $| = 1; # make unbuffered + select STDERR; $| = 1; # make unbuffered + select STDOUT; $| = 1; # make unbuffered - print STDOUT "stdout 1\n"; # this works for - print STDERR "stderr 1\n"; # subprocesses too + print STDOUT "stdout 1\n"; # this works for + print STDERR "stderr 1\n"; # subprocesses too open STDOUT, ">&", $oldout or die "Can't dup \$oldout: $!"; open STDERR, ">&OLDERR" or die "Can't dup OLDERR: $!"; @@ -3305,7 +3478,7 @@ C using various methods: If you specify C<< '<&=X' >>, where C is a file descriptor number or a filehandle, then Perl will do an equivalent of C's C of -that file descriptor (and not call L); this is more +that file descriptor (and not call C); this is more parsimonious of file descriptors. For example: # open for input, reusing the fileno of $fd @@ -3334,27 +3507,28 @@ the same file descriptor. Note that if you are using Perls older than 5.8.0, Perl will be using the standard C libraries' fdopen() to implement the "=" functionality. -On many UNIX systems fdopen() fails when file descriptors exceed a +On many Unix systems fdopen() fails when file descriptors exceed a certain value, typically 255. For Perls 5.8.0 and later, PerlIO is most often the default. You can see whether Perl has been compiled with PerlIO or not by -running C and looking for C line. If C -is C, you have PerlIO, otherwise you don't. +running C and looking for the C line. If C +is C, you have PerlIO; otherwise you don't. If you open a pipe on the command C<'-'>, i.e., either C<'|-'> or C<'-|'> -with 2-arguments (or 1-argument) form of open(), then +with the 2-argument (or 1-argument) form of open(), then there is an implicit fork done, and the return value of open is the pid of the child within the parent process, and C<0> within the child process. (Use C to determine whether the open was successful.) -The filehandle behaves normally for the parent, but i/o to that +The filehandle behaves normally for the parent, but I/O to that filehandle is piped from/to the STDOUT/STDIN of the child process. -In the child process the filehandle isn't opened--i/o happens from/to -the new STDOUT or STDIN. Typically this is used like the normal +In the child process, the filehandle isn't opened--I/O happens from/to +the new STDOUT/STDIN. Typically this is used like the normal piped open when you want to exercise more control over just how the -pipe command gets executed, such as when you are running setuid, and -don't want to have to scan shell commands for metacharacters. -The following triples are more or less equivalent: +pipe command gets executed, such as when running setuid and +you don't want to have to scan shell commands for metacharacters. + +The following blocks are more or less equivalent: open(FOO, "|tr '[a-z]' '[A-Z]'"); open(FOO, '|-', "tr '[a-z]' '[A-Z]'"); @@ -3366,10 +3540,10 @@ The following triples are more or less equivalent: open(FOO, '-|') || exec 'cat', '-n', $file; open(FOO, '-|', "cat", '-n', $file); -The last example in each block shows the pipe as "list form", which is +The last two examples in each block shows the pipe as "list form", which is not yet supported on all platforms. A good rule of thumb is that if your platform has true C (in other words, if your platform is -UNIX) you can use the list form. +Unix) you can use the list form. See L for more examples of this. @@ -3387,7 +3561,7 @@ Closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in C<$?> and C<${^CHILD_ERROR_NATIVE}>. -The filename passed to 2-argument (or 1-argument) form of open() will +The filename passed to the 2-argument (or 1-argument) form of open() will have leading and trailing whitespace deleted, and the normal redirection characters honored. This property, known as "magic open", can often be used to good effect. A user could specify a filename of @@ -3412,13 +3586,13 @@ of open(): open IN, $ARGV[0]; will allow the user to specify an argument of the form C<"rsh cat file |">, -but will not work on a filename which happens to have a trailing space, while +but will not work on a filename that happens to have a trailing space, while open IN, '<', $ARGV[0]; will have exactly the opposite restrictions. -If you want a "real" C C (see L on your system), then you +If you want a "real" C C (see C on your system), then you should use the C function, which involves no such magic (but may use subtly different filemodes than Perl open(), which is mapped to C fopen()). This is @@ -3426,7 +3600,7 @@ another way to protect your filenames from interpretation. For example: use IO::Handle; sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL) - or die "sysopen $path: $!"; + or die "sysopen $path: $!"; $oldfh = select(HANDLE); $| = 1; select($oldfh); print HANDLE "stuff $$\n"; seek(HANDLE, 0, 0); @@ -3440,14 +3614,14 @@ them, and automatically close whenever and however you leave that scope: use IO::File; #... sub read_myfile_munged { - my $ALL = shift; - my $handle = IO::File->new; - open($handle, "myfile") or die "myfile: $!"; - $first = <$handle> - or return (); # Automatically closed here. - mung $first or die "mung failed"; # Or here. - return $first, <$handle> if $ALL; # Or here. - $first; # Or here. + my $ALL = shift; + my $handle = IO::File->new; + open($handle, "myfile") or die "myfile: $!"; + $first = <$handle> + or return (); # Automatically closed here. + mung $first or die "mung failed"; # Or here. + return $first, <$handle> if $ALL; # Or here. + $first; # Or here. } See L for some details about mixing reading and writing. @@ -3463,7 +3637,7 @@ scalar variable (or array or hash element), the variable is assigned a reference to a new anonymous dirhandle. DIRHANDLEs have their own namespace separate from FILEHANDLEs. -See example at C. +See the example at C. =item ord EXPR X X @@ -3471,8 +3645,8 @@ X X =item ord Returns the numeric (the native 8-bit encoding, like ASCII or EBCDIC, -or Unicode) value of the first character of EXPR. If EXPR is omitted, -uses C<$_>. +or Unicode) value of the first character of EXPR. If EXPR is an empty +string, returns 0. If EXPR is omitted, uses C<$_>. For the reverse, see L. See L for more about Unicode. @@ -3490,7 +3664,7 @@ C associates a simple name with a package variable in the current package for use within the current scope. When C is in effect, C lets you use declared global variables without qualifying them with package names, within the lexical scope of the C declaration. -In this way C differs from C, which is package scoped. +In this way C differs from C, which is package-scoped. Unlike C, which both allocates storage for a variable and associates a simple name with that storage for use within the current scope, C @@ -3512,11 +3686,11 @@ of the declaration, not at the point of use. This means the following behavior holds: package Foo; - our $bar; # declares $Foo::bar for rest of lexical scope + our $bar; # declares $Foo::bar for rest of lexical scope $bar = 20; package Bar; - print $bar; # prints 20, as it refers to $Foo::bar + print $bar; # prints 20, as it refers to $Foo::bar Multiple C declarations with the same name in the same lexical scope are allowed if they are in different packages. If they happen @@ -3528,15 +3702,15 @@ merely redundant. use warnings; package Foo; - our $bar; # declares $Foo::bar for rest of lexical scope + our $bar; # declares $Foo::bar for rest of lexical scope $bar = 20; package Bar; - our $bar = 30; # declares $Bar::bar for rest of lexical scope - print $bar; # prints 30 + our $bar = 30; # declares $Bar::bar for rest of lexical scope + print $bar; # prints 30 - our $bar; # emits warning but has no other effect - print $bar; # still prints 30 + our $bar; # emits warning but has no other effect + print $bar; # still prints 30 An C declaration may also have a list of attributes associated with it. @@ -3555,81 +3729,82 @@ Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines -an integer may be represented by a sequence of 4 bytes that will be -converted to a sequence of 4 characters. +an integer may be represented by a sequence of 4 bytes, which will in +Perl be presented as a string that's 4 characters long. + +See L for an introduction to this function. The TEMPLATE is a sequence of characters that give the order and type of values, as follows: - a A string with arbitrary binary data, will be null padded. - A A text (ASCII) string, will be space padded. - Z A null terminated (ASCIZ) string, will be null padded. + a A string with arbitrary binary data, will be null padded. + A A text (ASCII) string, will be space padded. + Z A null-terminated (ASCIZ) string, will be null padded. - b A bit string (ascending bit order inside each byte, like vec()). - B A bit string (descending bit order inside each byte). - h A hex string (low nybble first). - H A hex string (high nybble first). + b A bit string (ascending bit order inside each byte, like vec()). + B A bit string (descending bit order inside each byte). + h A hex string (low nybble first). + H A hex string (high nybble first). - c A signed char (8-bit) value. - C An unsigned char (octet) value. - W An unsigned char value (can be greater than 255). + c A signed char (8-bit) value. + C An unsigned char (octet) value. + W An unsigned char value (can be greater than 255). - s A signed short (16-bit) value. - S An unsigned short value. + s A signed short (16-bit) value. + S An unsigned short value. - l A signed long (32-bit) value. - L An unsigned long value. + l A signed long (32-bit) value. + L An unsigned long value. - q A signed quad (64-bit) value. - Q An unsigned quad value. - (Quads are available only if your system supports 64-bit - integer values _and_ if Perl has been compiled to support those. - Causes a fatal error otherwise.) + q A signed quad (64-bit) value. + Q An unsigned quad value. + (Quads are available only if your system supports 64-bit + integer values _and_ if Perl has been compiled to support those. + Raises an exception otherwise.) - i A signed integer value. - I A unsigned integer value. - (This 'integer' is _at_least_ 32 bits wide. Its exact + i A signed integer value. + I A unsigned integer value. + (This 'integer' is _at_least_ 32 bits wide. Its exact size depends on what a local C compiler calls 'int'.) - n An unsigned short (16-bit) in "network" (big-endian) order. - N An unsigned long (32-bit) in "network" (big-endian) order. - v An unsigned short (16-bit) in "VAX" (little-endian) order. - V An unsigned long (32-bit) in "VAX" (little-endian) order. + n An unsigned short (16-bit) in "network" (big-endian) order. + N An unsigned long (32-bit) in "network" (big-endian) order. + v An unsigned short (16-bit) in "VAX" (little-endian) order. + V An unsigned long (32-bit) in "VAX" (little-endian) order. j A Perl internal signed integer value (IV). J A Perl internal unsigned integer value (UV). - f A single-precision float in the native format. - d A double-precision float in the native format. + f A single-precision float in native format. + d A double-precision float in native format. - F A Perl internal floating point value (NV) in the native format - D A long double-precision float in the native format. - (Long doubles are available only if your system supports long - double values _and_ if Perl has been compiled to support those. - Causes a fatal error otherwise.) + F A Perl internal floating-point value (NV) in native format + D A float of long-double precision in native format. + (Long doubles are available only if your system supports long + double values _and_ if Perl has been compiled to support those. + Raises an exception otherwise.) - p A pointer to a null-terminated string. - P A pointer to a structure (fixed-length string). + p A pointer to a null-terminated string. + P A pointer to a structure (fixed-length string). - u A uuencoded string. - U A Unicode character number. Encodes to a character in character mode + u A uuencoded string. + U A Unicode character number. Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode. - w A BER compressed integer (not an ASN.1 BER, see perlpacktut for - details). Its bytes represent an unsigned integer in base 128, - most significant digit first, with as few digits as possible. Bit - eight (the high bit) is set on each byte except the last. + w A BER compressed integer (not an ASN.1 BER, see perlpacktut for + details). Its bytes represent an unsigned integer in base 128, + most significant digit first, with as few digits as possible. Bit + eight (the high bit) is set on each byte except the last. - x A null byte. - X Back up a byte. - @ Null fill or truncate to absolute position, counted from the - start of the innermost ()-group. - . Null fill or truncate to absolute position specified by value. - ( Start of a ()-group. + x A null byte (a.k.a ASCII NUL, "\000", chr(0)) + X Back up a byte. + @ Null-fill or truncate to absolute position, counted from the + start of the innermost ()-group. + . Null-fill or truncate to absolute position specified by the value. + ( Start of a ()-group. -One or more of the modifiers below may optionally follow some letters in the -TEMPLATE (the second column lists the letters for which the modifier is -valid): +One or more modifiers below may optionally follow certain letters in the +TEMPLATE (the second column lists letters for which the modifier is valid): ! sSlLiI Forces native (short, long, int) sizes instead of fixed (16-/32-bit) sizes. @@ -3648,48 +3823,78 @@ valid): < sSiIlLqQ Force little-endian byte-order on the type. jJfFdDpP (The "little end" touches the construct.) -The C> and C> modifiers can also be used on C<()>-groups, -in which case they force a certain byte-order on all components of -that group, including subgroups. +The C<< > >> and C<< < >> modifiers can also be used on C<()> groups +to force a particular byte-order on all components in that group, +including all its subgroups. The following rules apply: -=over 8 +=over =item * -Each letter may optionally be followed by a number giving a repeat -count. With all types except C, C, C, C, C, C, -C, C<@>, C<.>, C, C and C

, where it means +something else, dscribed below. Supplying a C<*> for the repeat count +instead of a number means to use however many items are left, except for: + +=over + +=item * + +C<@>, C, and C, where it is equivalent to C<0>. + +=item * + +<.>, where it means relative to the start of the string. + +=item * + +C, where it is equivalent to 1 (or 45, which here is equivalent). + +=back + +One can replace a numeric repeat count with a template letter enclosed in +brackets to use the packed byte length of the bracketed template for the +repeat count. + +For example, the template C skips as many bytes as in a packed long, +and the template C<"$t X[$t] $t"> unpacks twice whatever $t (when +variable-expanded) unpacks. If the template in brackets contains alignment +commands (such as C), its packed length is calculated as if the +start of the template had the maximal possible alignment. + +When used with C, a C<*> as the repeat count is guaranteed to add a +trailing null byte, so the resulting string is always one byte longer than +the byte length of the item itself. When used with C<@>, the repeat count represents an offset from the start -of the innermost () group. +of the innermost C<()> group. + +When used with C<.>, the repeat count determines the starting position to +calculate the value offset as follows: + +=over + +=item * + +If the repeat count is C<0>, it's relative to the current position. -When used with C<.>, the repeat count is used to determine the starting -position from where the value offset is calculated. If the repeat count -is 0, it's relative to the current position. If the repeat count is C<*>, -the offset is relative to the start of the packed string. And if its an -integer C the offset is relative to the start of the n-th innermost -() group (or the start of the string if C is bigger then the group -level). +=item * + +If the repeat count is C<*>, the offset is relative to the start of the +packed string. + +=item * + +And if it's an integer I, the offset is relative to the start of the +Ith innermost C<()> group, or to the start of the string if I is +bigger then the group level. + +=back The repeat count for C is interpreted as the maximal number of bytes to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat @@ -3698,139 +3903,155 @@ count should not be more than 65. =item * The C, C, and C types gobble just one value, but pack it as a -string of length count, padding with nulls or spaces as necessary. When +string of length count, padding with nulls or spaces as needed. When unpacking, C strips trailing whitespace and nulls, C strips everything -after the first null, and C returns data verbatim. +after the first null, and C returns data without any sort of trimming. -If the value-to-pack is too long, it is truncated. If too long and an -explicit count is provided, C packs only C<$count-1> bytes, followed -by a null byte. Thus C always packs a trailing null (except when the -count is 0). +If the value to pack is too long, the result is truncated. If it's too +long and an explicit count is provided, C packs only C<$count-1> bytes, +followed by a null byte. Thus C always packs a trailing null, except +for when the count is 0. =item * -Likewise, the C and C fields pack a string that many bits long. -Each character of the input field of pack() generates 1 bit of the result. +Likewise, the C and C formats pack a string that's that many bits long. +Each such format generates 1 bit of the result. + Each result bit is based on the least-significant bit of the corresponding input character, i.e., on C. In particular, characters C<"0"> -and C<"1"> generate bits 0 and 1, as do characters C<"\0"> and C<"\1">. +and C<"1"> generate bits 0 and 1, as do characters C<"\000"> and C<"\001">. -Starting from the beginning of the input string of pack(), each 8-tuple -of characters is converted to 1 character of output. With format C +Starting from the beginning of the input string, each 8-tuple +of characters is converted to 1 character of output. With format C, the first character of the 8-tuple determines the least-significant bit of a -character, and with format C it determines the most-significant bit of +character; with format C, it determines the most-significant bit of a character. -If the length of the input string is not exactly divisible by 8, the +If the length of the input string is not evenly divisible by 8, the remainder is packed as if the input string were padded by null characters -at the end. Similarly, during unpack()ing the "extra" bits are ignored. +at the end. Similarly during unpacking, "extra" bits are ignored. + +If the input string is longer than needed, remaining characters are ignored. -If the input string of pack() is longer than needed, extra characters are -ignored. A C<*> for the repeat count of pack() means to use all the -characters of the input field. On unpack()ing the bits are converted to a -string of C<"0">s and C<"1">s. +A C<*> for the repeat count uses all characters of the input field. +On unpacking, bits are converted to a string of C<"0">s and C<"1">s. =item * -The C and C fields pack a string that many nybbles (4-bit groups, -representable as hexadecimal digits, 0-9a-f) long. +The C and C formats pack a string that many nybbles (4-bit groups, +representable as hexadecimal digits, C<"0".."9"> C<"a".."f">) long. -Each character of the input field of pack() generates 4 bits of the result. -For non-alphabetical characters the result is based on the 4 least-significant +For each such format, pack() generates 4 bits of the result. +With non-alphabetical characters, the result is based on the 4 least-significant bits of the input character, i.e., on C. In particular, characters C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes -C<"\0"> and C<"\1">. For characters C<"a".."f"> and C<"A".."F"> the result +C<"\000"> and C<"\001">. For characters C<"a".."f"> and C<"A".."F">, the result is compatible with the usual hexadecimal digits, so that C<"a"> and -C<"A"> both generate the nybble C<0xa==10>. The result for characters -C<"g".."z"> and C<"G".."Z"> is not well-defined. +C<"A"> both generate the nybble C<0xa==10>. Do not use any characters +but these with this format. -Starting from the beginning of the input string of pack(), each pair -of characters is converted to 1 character of output. With format C the +Starting from the beginning of the template to pack(), each pair +of characters is converted to 1 character of output. With format C, the first character of the pair determines the least-significant nybble of the -output character, and with format C it determines the most-significant +output character; with format C, it determines the most-significant nybble. -If the length of the input string is not even, it behaves as if padded -by a null character at the end. Similarly, during unpack()ing the "extra" -nybbles are ignored. +If the length of the input string is not even, it behaves as if padded by +a null character at the end. Similarly, "extra" nybbles are ignored during +unpacking. -If the input string of pack() is longer than needed, extra characters are -ignored. -A C<*> for the repeat count of pack() means to use all the characters of -the input field. On unpack()ing the nybbles are converted to a string -of hexadecimal digits. +If the input string is longer than needed, extra characters are ignored. + +A C<*> for the repeat count uses all characters of the input field. For +unpack(), nybbles are converted to a string of hexadecimal digits. =item * -The C

type packs a pointer to a null-terminated string. You are -responsible for ensuring the string is not a temporary value (which can -potentially get deallocated before you get around to using the packed result). -The C

type packs a pointer to a structure of the size indicated by the -length. A NULL pointer is created if the corresponding value for C

or -C

is C, similarly for unpack(). +The C

format packs a pointer to a null-terminated string. You are +responsible for ensuring that the string is not a temporary value, as that +could potentially get deallocated before you got around to using the packed +result. The C

format packs a pointer to a structure of the size indicated +by the length. A null pointer is created if the corresponding value for +C

or C

is C; similarly with unpack(), where a null pointer +unpacks into C. -If your system has a strange pointer size (i.e. a pointer is neither as -big as an int nor as big as a long), it may not be possible to pack or +If your system has a strange pointer size--meaning a pointer is neither as +big as an int nor as big as a long--it may not be possible to pack or unpack pointers in big- or little-endian byte order. Attempting to do -so will result in a fatal error. +so raises an exception. =item * The C template character allows packing and unpacking of a sequence of -items where the packed structure contains a packed item count followed by -the packed items themselves. - -For C you write ICI and the -I describes how the length value is packed. The ones likely -to be of most use are integer-packing ones like C (for Java strings), -C (for ASN.1 or SNMP) and C (for Sun XDR). - -For C, the I may have a repeat count, in which case -the minimum of that and the number of available items is used as argument -for the I. If it has no repeat count or uses a '*', the number +items where the packed structure contains a packed item count followed by +the packed items themselves. This is useful when the structure you're +unpacking has encoded the sizes or repeat counts for some of its fields +within the structure itself as separate fields. + +For C, you write ICI, and the +I describes how the length value is packed. Formats likely +to be of most use are integer-packing ones like C for Java strings, +C for ASN.1 or SNMP, and C for Sun XDR. + +For C, I may have a repeat count, in which case +the minimum of that and the number of available items is used as the argument +for I. If it has no repeat count or uses a '*', the number of available items is used. -For C an internal stack of integer arguments unpacked so far is +For C, an internal stack of integer arguments unpacked so far is used. You write CI and the repeat count is obtained by popping off the last element from the stack. The I must not have a repeat count. -If the I refers to a string type (C<"A">, C<"a"> or C<"Z">), -the I is a string length, not a number of strings. If there is -an explicit repeat count for pack, the packed string will be adjusted to that -given length. +If I refers to a string type (C<"A">, C<"a">, or C<"Z">), +the I is the string length, not the number of strings. With +an explicit repeat count for pack, the packed string is adjusted to that +length. For example: + + unpack("W/a", "\004Gurusamy") gives ("Guru") + unpack("a3/A A*", "007 Bond J ") gives (" Bond", "J") + unpack("a3 x2 /A A*", "007: Bond, J.") gives ("Bond, J", ".") - unpack 'W/a', "\04Gurusamy"; gives ('Guru') - unpack 'a3/A A*', '007 Bond J '; gives (' Bond', 'J') - unpack 'a3 x2 /A A*', '007: Bond, J.'; gives ('Bond, J', '.') - pack 'n/a* w/a','hello,','world'; gives "\000\006hello,\005world" - pack 'a/W2', ord('a') .. ord('z'); gives '2ab' + pack("n/a* w/a","hello,","world") gives "\000\006hello,\005world" + pack("a/W2", ord("a") .. ord("z")) gives "2ab" The I is not returned explicitly from C. -Adding a count to the I letter is unlikely to do anything -useful, unless that letter is C, C or C. Packing with a -I of C or C may introduce C<"\000"> characters, -which Perl does not regard as legal in numeric strings. +Supplying a count to the I format letter is only useful with +C, C, or C. Packing with a I of C or C may +introduce C<"\000"> characters, which Perl does not regard as legal in +numeric strings. =item * The integer types C, C, C, and C may be -followed by a C modifier to signify native shorts or -longs--as you can see from above for example a bare C does mean -exactly 32 bits, the native C (as seen by the local C compiler) -may be larger. This is an issue mainly in 64-bit platforms. You can -see whether using C makes any difference by +followed by a C modifier to specify native shorts or +longs. As shown in the example above, a bare C means +exactly 32 bits, although the native C as seen by the local C compiler +may be larger. This is mainly an issue on 64-bit platforms. You can +see whether using C makes any difference this way: + + printf "format s is %d, s! is %d\n", + length pack("s"), length pack("s!"); - print length(pack("s")), " ", length(pack("s!")), "\n"; - print length(pack("l")), " ", length(pack("l!")), "\n"; + printf "format l is %d, l! is %d\n", + length pack("l"), length pack("l!"); -C and C also work but only because of completeness; + +C and C are also allowed, but only for completeness' sake: they are identical to C and C. The actual sizes (in bytes) of native shorts, ints, longs, and long -longs on the platform where Perl was built are also available via -L: +longs on the platform where Perl was built are also available from +the command line: + + $ perl -V:{short,int,long{,long}}size + shortsize='2'; + intsize='4'; + longsize='4'; + longlongsize='8'; + +or programmatically via the C module: use Config; print $Config{shortsize}, "\n"; @@ -3838,165 +4059,188 @@ L: print $Config{longsize}, "\n"; print $Config{longlongsize}, "\n"; -(The C<$Config{longlongsize}> will be undefined if your system does -not support long longs.) +C<$Config{longlongsize}> is undefined on systems without +long long support. =item * -The integer formats C, C, C, C, C, C, C, and C -are inherently non-portable between processors and operating systems -because they obey the native byteorder and endianness. For example a -4-byte integer 0x12345678 (305419896 decimal) would be ordered natively -(arranged in and handled by the CPU registers) into bytes as +The integer formats C, C, C, C, C, C, C, and C are +inherently non-portable between processors and operating systems because +they obey native byteorder and endianness. For example, a 4-byte integer +0x12345678 (305419896 decimal) would be ordered natively (arranged in and +handled by the CPU registers) into bytes as - 0x12 0x34 0x56 0x78 # big-endian - 0x78 0x56 0x34 0x12 # little-endian + 0x12 0x34 0x56 0x78 # big-endian + 0x78 0x56 0x34 0x12 # little-endian -Basically, the Intel and VAX CPUs are little-endian, while everybody -else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and -Cray are big-endian. Alpha and MIPS can be either: Digital/Compaq -used/uses them in little-endian mode; SGI/Cray uses them in big-endian -mode. +Basically, Intel and VAX CPUs are little-endian, while everybody else, +including Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray, are +big-endian. Alpha and MIPS can be either: Digital/Compaq used/uses them in +little-endian mode, but SGI/Cray uses them in big-endian mode. -The names `big-endian' and `little-endian' are comic references to -the classic "Gulliver's Travels" (via the paper "On Holy Wars and a -Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and -the egg-eating habits of the Lilliputians. +The names I and I are comic references to the +egg-eating habits of the little-endian Lilliputians and the big-endian +Blefuscudians from the classic Jonathan Swift satire, I. +This entered computer lingo via the paper "On Holy Wars and a Plea for +Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980. Some systems may have even weirder byte orders such as - 0x56 0x78 0x12 0x34 - 0x34 0x12 0x78 0x56 + 0x56 0x78 0x12 0x34 + 0x34 0x12 0x78 0x56 -You can see your system's preference with +You can determine your system endianness with this incantation: - print join(" ", map { sprintf "%#02x", $_ } - unpack("W*",pack("L",0x12345678))), "\n"; + printf("%#02x ", $_) for unpack("W*", pack L=>0x12345678); The byteorder on the platform where Perl was built is also available via L: - use Config; - print $Config{byteorder}, "\n"; + use Config; + print "$Config{byteorder}\n"; + +or from the command line: -Byteorders C<'1234'> and C<'12345678'> are little-endian, C<'4321'> -and C<'87654321'> are big-endian. + $ perl -V:byteorder -If you want portable packed integers you can either use the formats -C, C, C, and C, or you can use the C> and C> -modifiers. These modifiers are only available as of perl 5.9.2. -See also L. +Byteorders C<"1234"> and C<"12345678"> are little-endian; C<"4321"> +and C<"87654321"> are big-endian. + +For portably packed integers, either use the formats C, C, C, +and C or else use the C<< > >> and C<< < >> modifiers described +immediately below. See also L. =item * -All integer and floating point formats as well as C

and C

and -C<()>-groups may be followed by the C> or C> modifiers -to force big- or little- endian byte-order, respectively. -This is especially useful, since C, C, C and C don't cover -signed integers, 64-bit integers and floating point values. However, -there are some things to keep in mind. +Starting with Perl 5.9.2, integer and floating-point formats, along with +the C

and C

formats and C<()> groups, may all be followed by the +C<< > >> or C<< < >> endianness modifiers to respectively enforce big- +or little-endian byte-order. These modifiers are especially useful +given how C, C, C and C don't cover signed integers, +64-bit integers, or floating-point values. + +Here are some concerns to keep in mind when using an endianness modifier: + +=over + +=item * -Exchanging signed integers between different platforms only works -if all platforms store them in the same format. Most platforms store -signed integers in two's complement, so usually this is not an issue. +Exchanging signed integers between different platforms works only +when all platforms store them in the same format. Most platforms store +signed integers in two's-complement notation, so usually this is not an issue. -The C> or C> modifiers can only be used on floating point +=item * + +The C<< > >> or C<< < >> modifiers can only be used on floating-point formats on big- or little-endian machines. Otherwise, attempting to -do so will result in a fatal error. - -Forcing big- or little-endian byte-order on floating point values for -data exchange can only work if all platforms are using the same -binary representation (e.g. IEEE floating point format). Even if all -platforms are using IEEE, there may be subtle differences. Being able -to use C> or C> on floating point values can be very useful, -but also very dangerous if you don't know exactly what you're doing. -It is definitely not a general way to portably store floating point -values. - -When using C> or C> on an C<()>-group, this will affect -all types inside the group that accept the byte-order modifiers, -including all subgroups. It will silently be ignored for all other +use them raises an exception. + +=item * + +Forcing big- or little-endian byte-order on floating-point values for +data exchange can work only if all platforms use the same +binary representation such as IEEE floating-point. Even if all +platforms are using IEEE, there may still be subtle differences. Being able +to use C<< > >> or C<< < >> on floating-point values can be useful, +but also dangerous if you don't know exactly what you're doing. +It is not a general way to portably store floating-point values. + +=item * + +When using C<< > >> or C<< < >> on a C<()> group, this affects +all types inside the group that accept byte-order modifiers, +including all subgroups. It is silently ignored for all other types. You are not allowed to override the byte-order within a group that already has a byte-order modifier suffix. +=back + =item * -Real numbers (floats and doubles) are in the native machine format only; -due to the multiplicity of floating formats around, and the lack of a -standard "network" representation, no facility for interchange has been -made. This means that packed floating point data written on one machine -may not be readable on another - even if both use IEEE floating point -arithmetic (as the endian-ness of the memory representation is not part +Real numbers (floats and doubles) are in native machine format only. +Due to the multiplicity of floating-point formats and the lack of a +standard "network" representation for them, no facility for interchange has been +made. This means that packed floating-point data written on one machine +may not be readable on another, even if both use IEEE floating-point +arithmetic (because the endianness of the memory representation is not part of the IEEE spec). See also L. -If you know exactly what you're doing, you can use the C> or C> -modifiers to force big- or little-endian byte-order on floating point values. +If you know I what you're doing, you can use the C<< > >> or C<< < >> +modifiers to force big- or little-endian byte-order on floating-point values. -Note that Perl uses doubles (or long doubles, if configured) internally for -all numeric calculation, and converting from double into float and thence back -to double again will lose precision (i.e., C) -will not in general equal $foo). +Because Perl uses doubles (or long doubles, if configured) internally for +all numeric calculation, converting from double into float and thence +to double again loses precision, so C) +will not in general equal $foo. =item * -Pack and unpack can operate in two modes, character mode (C mode) where -the packed string is processed per character and UTF-8 mode (C mode) +Pack and unpack can operate in two modes: character mode (C mode) where +the packed string is processed per character, and UTF-8 mode (C mode) where the packed string is processed in its UTF-8-encoded Unicode form on -a byte by byte basis. Character mode is the default unless the format string -starts with an C. You can switch mode at any moment with an explicit -C or C in the format. A mode is in effect until the next mode switch -or until the end of the ()-group in which it was entered. +a byte-by-byte basis. Character mode is the default unless the format string +starts with C. You can always switch mode mid-format with an explicit +C or C in the format. This mode remains in effect until the next +mode change, or until the end of the C<()> group it (directly) applies to. =item * -You must yourself do any alignment or padding by inserting for example -enough C<'x'>es while packing. There is no way to pack() and unpack() -could know where the characters are going to or coming from. Therefore -C (and C) handle their output and input as flat -sequences of characters. +You must yourself do any alignment or padding by inserting, for example, +enough C<"x">es while packing. There is no way for pack() and unpack() +to know where characters are going to or coming from, so they +handle their output and input as flat sequences of characters. =item * -A ()-group is a sub-TEMPLATE enclosed in parentheses. A group may -take a repeat count, both as postfix, and for unpack() also via the C -template character. Within each repetition of a group, positioning with -C<@> starts again at 0. Therefore, the result of +A C<()> group is a sub-TEMPLATE enclosed in parentheses. A group may +take a repeat count either as postfix, or for unpack(), also via the C +template character. Within each repetition of a group, positioning with +C<@> starts over at 0. Therefore, the result of - pack( '@1A((@2A)@3A)', 'a', 'b', 'c' ) + pack("@1A((@2A)@3A)", qw[X Y Z]) -is the string "\0a\0\0bc". +is the string C<"\0X\0\0YZ">. =item * -C and C accept C modifier. In this case they act as -alignment commands: they jump forward/back to the closest position -aligned at a multiple of C characters. For example, to pack() or -unpack() C's C one may need to -use the template C; this assumes that doubles must be -aligned on the double's size. +C and C accept the C modifier to act as alignment commands: they +jump forward or back to the closest position aligned at a multiple of C +characters. For example, to pack() or unpack() a C structure like + + struct { + char c; /* one signed, 8-bit character */ + double d; + char cc[2]; + } -For alignment commands C of 0 is equivalent to C of 1; -both result in no-ops. +one may need to use the template C. This assumes that +doubles must be aligned to the size of double. + +For alignment commands, a C of 0 is equivalent to a C of 1; +both are no-ops. =item * -C, C, C and C accept the C modifier. In this case they -will represent signed 16-/32-bit integers in big-/little-endian order. -This is only portable if all platforms sharing the packed data use the -same binary representation for signed integers (e.g. all platforms are -using two's complement representation). +C, C, C and C accept the C modifier to +represent signed 16-/32-bit integers in big-/little-endian order. +This is portable only when all platforms sharing packed data use the +same binary representation for signed integers; for example, when all +platforms use two's-complement representation. =item * -A comment in a TEMPLATE starts with C<#> and goes to the end of line. -White space may be used to separate pack codes from each other, but -modifiers and a repeat count must follow immediately. +Comments can be embedded in a TEMPLATE using C<#> through the end of line. +White space can separate pack codes from each other, but modifiers and +repeat counts must follow immediately. Breaking complex templates into +individual line-by-line components, suitably annotated, can do as much to +improve legibility and maintainability of pack/unpack formats as C can +for complicated pattern matches. =item * -If TEMPLATE requires more arguments to pack() than actually given, pack() +If TEMPLATE requires more arguments than pack() is given, pack() assumes additional C<""> arguments. If TEMPLATE requires fewer arguments -to pack() than actually given, extra arguments are ignored. +than given, extra arguments are ignored. =back @@ -4019,14 +4263,14 @@ Examples: $foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD" - # note: the above examples featuring "W" and "c" are true + # NOTE: The examples above featuring "W" and "c" are true # only on ASCII and ASCII-derived systems such as ISO Latin 1 - # and UTF-8. In EBCDIC the first example would be - # $foo = pack("WWWW",193,194,195,196); + # and UTF-8. On EBCDIC systems, the first example would be + # $foo = pack("WWWW",193,194,195,196); $foo = pack("s2",1,2); - # "\1\0\2\0" on little-endian - # "\0\1\0\2" on big-endian + # "\001\000\002\000" on little-endian + # "\000\001\000\002" on big-endian $foo = pack("a4","abcd","x","y","z"); # "abcd" @@ -4048,7 +4292,7 @@ Examples: # "@utmp1" eq "@utmp2" sub bintodec { - unpack("N", pack("B32", substr("0" x 32 . shift, -32))); + unpack("N", pack("B32", substr("0" x 32 . shift, -32))); } $foo = pack('sx2l', 12, 34); @@ -4070,26 +4314,43 @@ Examples: The same template may generally also be used in unpack(). +=item package NAMESPACE VERSION +X X X X + =item package NAMESPACE -X X X - -=item package - -Declares the compilation unit as being in the given namespace. The scope -of the package declaration is from the declaration itself through the end -of the enclosing block, file, or eval (the same as the C operator). -All further unqualified dynamic identifiers will be in this namespace. -A package statement affects only dynamic variables--including those -you've used C on--but I lexical variables, which are created -with C. Typically it would be the first declaration in a file to -be included by the C or C operator. You can switch into a -package in more than one place; it merely influences which symbol table -is used by the compiler for the rest of that block. You can refer to -variables and filehandles in other packages by prefixing the identifier -with the package name and a double colon: C<$Package::Variable>. -If the package name is null, the C

package as assumed. That is, -C<$::sail> is equivalent to C<$main::sail> (as well as to C<$main'sail>, -still seen in older code). + +=item package NAMESPACE VERSION BLOCK +X X X X + +=item package NAMESPACE BLOCK + +Declares the BLOCK, or the rest of the compilation unit, as being in +the given namespace. The scope of the package declaration is either the +supplied code BLOCK or, in the absence of a BLOCK, from the declaration +itself through the end of the enclosing block, file, or eval (the same +as the C operator). All unqualified dynamic identifiers in this +scope will be in the given namespace, except where overridden by another +C declaration. + +A package statement affects dynamic variables only, including those +you've used C on, but I lexical variables, which are created +with C (or C (or C)). Typically it would be the first +declaration in a file included by C or C. You can switch into a +package in more than one place, since this only determines which default +symbol table the compiler uses for the rest of that block. You can refer to +identifiers in other packages than the current one by prefixing the identifier +with the package name and a double colon, as in C<$SomePack::var> +or C. If package name is omitted, the C
+package as assumed. That is, C<$::sail> is equivalent to +C<$main::sail> (as well as to C<$main'sail>, still seen in ancient +code, mostly from Perl 4). + +If VERSION is provided, C sets the C<$VERSION> variable in the given +namespace to a L object with the VERSION provided. VERSION must be a +"strict" style version number as defined by the L module: a positive +decimal number (integer or decimal-fraction) without exponentiation or else a +dotted-decimal v-string with a leading 'v' character and at least three +components. You should set C<$VERSION> only once per package. See L for more information about packages, modules, and classes. See L for other scoping issues. @@ -4103,14 +4364,15 @@ unless you are very careful. In addition, note that Perl's pipes use IO buffering, so you may need to set C<$|> to flush your WRITEHANDLE after each command, depending on the application. -See L, L, and L +See L, L, and +L for examples of such things. -On systems that support a close-on-exec flag on files, the flag will be set -for the newly opened file descriptors as determined by the value of $^F. -See L. +On systems that support a close-on-exec flag on files, that flag is set +on all newly opened file descriptors whose Cs are I than +the current value of $^F (by default 2 for C). See L. -=item pop ARRAY +=item pop ARRAY (or ARRAYREF) X X =item pop @@ -4118,25 +4380,36 @@ X X Pops and returns the last value of the array, shortening the array by one element. -If there are no elements in the array, returns the undefined value -(although this may happen at other times as well). If ARRAY is -omitted, pops the C<@ARGV> array in the main program, and the C<@_> -array in subroutines, just like C. +Returns the undefined value if the array is empty, although this may also +happen at other times. If ARRAY is omitted, pops the C<@ARGV> array in the +main program, but the C<@_> array in subroutines, just like C. + +If given a reference to an array, the argument will be dereferenced +automatically. =item pos SCALAR X X =item pos -Returns the offset of where the last C search left off for the variable -in question (C<$_> is used when the variable is not specified). Note that -0 is a valid match offset. C indicates that the search position -is reset (usually due to match failure, but can also be because no match has -yet been performed on the scalar). C directly accesses the location used -by the regexp engine to store the offset, so assigning to C will change -that offset, and so will also influence the C<\G> zero-width assertion in -regular expressions. Because a failed C match doesn't reset the offset, -the return from C won't change either in this case. See L and +Returns the offset of where the last C search left off for the +variable in question (C<$_> is used when the variable is not +specified). Note that 0 is a valid match offset. C indicates +that the search position is reset (usually due to match failure, but +can also be because no match has yet been run on the scalar). + +C directly accesses the location used by the regexp engine to +store the offset, so assigning to C will change that offset, and +so will also influence the C<\G> zero-width assertion in regular +expressions. Both of these effects take place for the next match, so +you can't affect the position with C during the current match, +such as in C<(?{pos() = 5})> or C. + +Setting C also resets the I flag, described +under L. + +Because a failed C match doesn't reset the offset, the return +from C won't change either in this case. See L and L. =item print FILEHANDLE LIST @@ -4147,15 +4420,15 @@ X =item print Prints a string or a list of strings. Returns true if successful. -FILEHANDLE may be a scalar variable name, in which case the variable -contains the name of or a reference to the filehandle, thus introducing +FILEHANDLE may be a scalar variable containing +the name of or a reference to the filehandle, thus introducing one level of indirection. (NOTE: If FILEHANDLE is a variable and the next token is a term, it may be misinterpreted as an operator unless you interpose a C<+> or put parentheses around the arguments.) -If FILEHANDLE is omitted, prints by default to standard output (or -to the last selected output channel--see L). If LIST is -also omitted, prints C<$_> to the currently selected output channel. -To set the default output channel to something other than STDOUT +If FILEHANDLE is omitted, prints to standard output by default, or +to the last selected output channel; see L. If LIST is +also omitted, prints C<$_> to the currently selected output handle. +To set the default output handle to something other than STDOUT use the select operation. The current value of C<$,> (if any) is printed between each LIST item. The current value of C<$\> (if any) is printed after the entire LIST has been printed. Because @@ -4164,8 +4437,8 @@ context, and any subroutine that you call will have one or more of its expressions evaluated in list context. Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to -the print--interpose a C<+> or put parentheses around all the -arguments. +the print; put parentheses around all the arguments +(or interpose a C<+>, but that doesn't look as good). Note that if you're storing FILEHANDLEs in an array, or if you're using any other expression more complex than a scalar variable to retrieve it, @@ -4174,6 +4447,9 @@ you will have to use a block returning the filehandle value instead: print { $files[$i] } "stuff\n"; print { $OK ? STDOUT : STDERR } "stuff\n"; +Printing to a closed pipe or socket will generate a SIGPIPE signal. See +L for more on signal handling. + =item printf FILEHANDLE FORMAT, LIST X @@ -4184,7 +4460,7 @@ Equivalent to C, except that C<$\> of the list will be interpreted as the C format. See C for an explanation of the format argument. If C is in effect, and POSIX::setlocale() has been called, the character used for the decimal -separator in formatted floating point numbers is affected by the LC_NUMERIC +separator in formatted floating-point numbers is affected by the LC_NUMERIC locale. See L and L. Don't fall into the trap of using a C when a simple @@ -4199,13 +4475,13 @@ function has no prototype). FUNCTION is a reference to, or the name of, the function whose prototype you want to retrieve. If FUNCTION is a string starting with C, the rest is taken as a -name for Perl builtin. If the builtin is not I (such as +name for a Perl builtin. If the builtin is not I (such as C) or if its arguments cannot be adequately expressed by a prototype (such as C), prototype() returns C, because the builtin does not really behave like a Perl function. Otherwise, the string describing the equivalent prototype is returned. -=item push ARRAY,LIST +=item push ARRAY (or ARRAYREF),LIST X X Treats ARRAY as a stack, and pushes the values of LIST @@ -4213,12 +4489,15 @@ onto the end of ARRAY. The length of ARRAY increases by the length of LIST. Has the same effect as for $value (LIST) { - $ARRAY[++$#ARRAY] = $value; + $ARRAY[++$#ARRAY] = $value; } but is more efficient. Returns the number of elements in the array following the completed C. +If given a reference to an array, the argument will be dereferenced +automatically. + =item q/STRING/ =item qq/STRING/ @@ -4247,6 +4526,32 @@ the C<\Q> escape in double-quoted strings. If EXPR is omitted, uses C<$_>. +quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into +regular expressions, because by default an interpolated variable will be +considered a mini-regular expression. For example: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + $sentence =~ s{$substring}{big bad wolf}; + +Will cause C<$sentence> to become C<'The big bad wolf jumped over...'>. + +On the other hand: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + $sentence =~ s{\Q$substring\E}{big bad wolf}; + +Or: + + my $sentence = 'The quick brown fox jumped over the lazy dog'; + my $substring = 'quick.*?fox'; + my $quoted_substring = quotemeta($substring); + $sentence =~ s{$quoted_substring}{big bad wolf}; + +Will both leave the sentence as is. Normally, when accepting string input from +the user, quotemeta() or C<\Q> must be used. + =item rand EXPR X X @@ -4255,8 +4560,8 @@ X X Returns a random fractional number greater than or equal to C<0> and less than the value of EXPR. (EXPR should be positive.) If EXPR is omitted, the value C<1> is used. Currently EXPR with the value C<0> is -also special-cased as C<1> - this has not been documented before perl 5.8.0 -and is subject to change in future versions of perl. Automatically calls +also special-cased as C<1> (this was undocumented before Perl 5.8.0 +and is subject to change in future versions of Perl). Automatically calls C unless C has already been called. See also C. Apply C to the value returned by C if you want random @@ -4289,8 +4594,8 @@ the string. A positive OFFSET greater than the length of SCALAR results in the string being padded to the required size with C<"\0"> bytes before the result of the read is appended. -The call is actually implemented in terms of either Perl's or system's -fread() call. To get a true read(2) system call, see C. +The call is implemented in terms of either Perl's or your system's native +fread(3) library function. To get a true read(2) system call, see C. Note the I: depending on the status of the filehandle, either (8-bit) bytes or characters are read. By default all @@ -4305,8 +4610,8 @@ X Returns the next directory entry for a directory opened by C. If used in list context, returns all the rest of the entries in the -directory. If there are no more entries, returns an undefined value in -scalar context or a null list in list context. +directory. If there are no more entries, returns the undefined value in +scalar context and the empty list in list context. If you're planning to filetest the return values out of a C, you'd better prepend the directory in question. Otherwise, because we didn't @@ -4316,6 +4621,15 @@ C there, it would have been testing the wrong file. @dots = grep { /^\./ && -f "$some_dir/$_" } readdir($dh); closedir $dh; +As of Perl 5.11.2 you can use a bare C in a C loop, +which will set C<$_> on every iteration. + + opendir(my $dh, $some_dir) || die; + while(readdir $dh) { + print "$some_dir/$_\n"; + } + closedir $dh; + =item readline EXPR =item readline @@ -4323,14 +4637,14 @@ X X X Reads from the filehandle whose typeglob is contained in EXPR (or from *ARGV if EXPR is not provided). In scalar context, each call reads and -returns the next line, until end-of-file is reached, whereupon the -subsequent call returns undef. In list context, reads until end-of-file +returns the next line until end-of-file is reached, whereupon the +subsequent call returns C. In list context, reads until end-of-file is reached and returns a list of lines. Note that the notion of "line" -used here is however you may have defined it with C<$/> or +used here is whatever you may have defined with C<$/> or C<$INPUT_RECORD_SEPARATOR>). See L. -When C<$/> is set to C, when readline() is in scalar -context (i.e. file slurp mode), and when an empty file is read, it +When C<$/> is set to C, when C is in scalar +context (i.e., file slurp mode), and when an empty file is read, it returns C<''> the first time, followed by C subsequently. This is the internal function implementing the C<< >> @@ -4338,21 +4652,31 @@ operator, but you can use it directly. The C<< >> operator is discussed in more detail in L. $line = ; - $line = readline(*STDIN); # same thing + $line = readline(*STDIN); # same thing -If readline encounters an operating system error, C<$!> will be set with the -corresponding error message. It can be helpful to check C<$!> when you are -reading from filehandles you don't trust, such as a tty or a socket. The -following example uses the operator form of C, and takes the necessary -steps to ensure that C was successful. +If C encounters an operating system error, C<$!> will be set +with the corresponding error message. It can be helpful to check +C<$!> when you are reading from filehandles you don't trust, such as a +tty or a socket. The following example uses the operator form of +C and dies if the result is not defined. - for (;;) { - undef $!; - unless (defined( $line = <> )) { - die $! if $!; - last; # reached EOF + while ( ! eof($fh) ) { + defined( $_ = <$fh> ) or die "readline failed: $!"; + ... + } + +Note that you have can't handle C errors that way with the +C filehandle. In that case, you have to open each element of +C<@ARGV> yourself since C handles C differently. + + foreach my $arg (@ARGV) { + open(my $fh, $arg) or warn "Can't open $arg: $!"; + + while ( ! eof($fh) ) { + defined( $_ = <$fh> ) + or die "readline failed for $arg: $!"; + ... } - # ... } =item readlink EXPR @@ -4361,7 +4685,7 @@ X =item readlink Returns the value of a symbolic link, if symbolic links are -implemented. If not, gives a fatal error. If there is some system +implemented. If not, raises an exception. If there is a system error, returns the undefined value and sets C<$!> (errno). If EXPR is omitted, uses C<$_>. @@ -4414,21 +4738,21 @@ normally use this command: # a simpleminded Pascal comment stripper # (warning: assumes no { or } in strings) LINE: while () { - while (s|({.*}.*){.*}|$1 |) {} - s|{.*}| |; - if (s|{.*| |) { - $front = $_; - while () { - if (/}/) { # end of comment? - s|^|$front\{|; - redo LINE; - } - } - } - print; + while (s|({.*}.*){.*}|$1 |) {} + s|{.*}| |; + if (s|{.*| |) { + $front = $_; + while () { + if (/}/) { # end of comment? + s|^|$front\{|; + redo LINE; + } + } + } + print; } -C cannot be used to retry a block which returns a value such as +C cannot be used to retry a block that returns a value such as C, C or C, and should not be used to exit a grep() or map() operation. @@ -4466,10 +4790,10 @@ If the referenced object has been blessed into a package, then that package name is returned instead. You can think of C as a C operator. if (ref($r) eq "HASH") { - print "r is a reference to a hash.\n"; + print "r is a reference to a hash.\n"; } unless (ref($r)) { - print "r is not a reference at all.\n"; + print "r is not a reference at all.\n"; } The return value C indicates a reference to an lvalue that is not @@ -4510,7 +4834,7 @@ specified by EXPR or by C<$_> if EXPR is not supplied. VERSION may be either a numeric argument such as 5.006, which will be compared to C<$]>, or a literal of the form v5.6.1, which will be compared -to C<$^V> (aka $PERL_VERSION). A fatal error is produced at run time if +to C<$^V> (aka $PERL_VERSION). An exception is raised if VERSION is greater than the version of the current Perl interpreter. Compare with L, which can do a similar check at compile time. @@ -4519,9 +4843,9 @@ avoided, because it leads to misleading error messages under earlier versions of Perl that do not support this syntax. The equivalent numeric version should be used instead. - require v5.6.1; # run time version check - require 5.6.1; # ditto - require 5.006_001; # ditto; preferred for backwards compatibility + require v5.6.1; # run time version check + require 5.6.1; # ditto + require 5.006_001; # ditto; preferred for backwards compatibility Otherwise, C demands that a library file be included if it hasn't already been included. The file is included via the do-FILE @@ -4574,7 +4898,7 @@ modules does not risk altering your namespace. In other words, if you try this: - require Foo::Bar; # a splendid bareword + require Foo::Bar; # a splendid bareword The require function will actually look for the "F" file in the directories specified in the C<@INC> array. @@ -4582,32 +4906,32 @@ directories specified in the C<@INC> array. But if you try this: $class = 'Foo::Bar'; - require $class; # $class is not a bareword + require $class; # $class is not a bareword #or - require "Foo::Bar"; # not a bareword because of the "" + require "Foo::Bar"; # not a bareword because of the "" The require function will look for the "F" file in the @INC array and will complain about not finding "F" there. In this case you can do: eval "require $class"; -Now that you understand how C looks for files in the case of a +Now that you understand how C looks for files with a bareword argument, there is a little extra functionality going on behind the scenes. Before C looks for a "F<.pm>" extension, it will first look for a similar filename with a "F<.pmc>" extension. If this file is found, it will be loaded in place of any file ending in a "F<.pm>" extension. -You can also insert hooks into the import facility, by putting directly -Perl code into the @INC array. There are three forms of hooks: subroutine +You can also insert hooks into the import facility, by putting Perl code +directly into the @INC array. There are three forms of hooks: subroutine references, array references and blessed objects. Subroutine references are the simplest case. When the inclusion system walks through @INC and encounters a subroutine, this subroutine gets -called with two parameters, the first being a reference to itself, and the -second the name of the file to be included (e.g. "F"). The -subroutine should return nothing, or a list of up to three values in the -following order: +called with two parameters, the first a reference to itself, and the +second the name of the file to be included (e.g., "F"). The +subroutine should return either nothing or else a list of up to three +values in the following order: =over @@ -4620,8 +4944,8 @@ A filehandle, from which the file will be read. A reference to a subroutine. If there is no filehandle (previous item), then this subroutine is expected to generate one line of source code per call, writing the line into C<$_> and returning 1, then returning 0 at -"end of file". If there is a filehandle, then the subroutine will be -called to act a simple source filter, with the line as read in C<$_>. +end of file. If there is a filehandle, then the subroutine will be +called to act as a simple source filter, with the line as read in C<$_>. Again, return 1 for each valid line, and 0 after all lines have been returned. @@ -4633,32 +4957,32 @@ reference to the subroutine itself is passed in as C<$_[0]>. =back If an empty list, C, or nothing that matches the first 3 values above -is returned then C will look at the remaining elements of @INC. -Note that this file handle must be a real file handle (strictly a typeglob, -or reference to a typeglob, blessed or unblessed) - tied file handles will be +is returned, then C looks at the remaining elements of @INC. +Note that this filehandle must be a real filehandle (strictly a typeglob +or reference to a typeglob, blessed or unblessed); tied filehandles will be ignored and return value processing will stop there. If the hook is an array reference, its first element must be a subroutine reference. This subroutine is called as above, but the first parameter is -the array reference. This enables to pass indirectly some arguments to +the array reference. This lets you indirectly pass arguments to the subroutine. In other words, you can write: push @INC, \&my_sub; sub my_sub { - my ($coderef, $filename) = @_; # $coderef is \&my_sub - ... + my ($coderef, $filename) = @_; # $coderef is \&my_sub + ... } or: push @INC, [ \&my_sub, $x, $y, ... ]; sub my_sub { - my ($arrayref, $filename) = @_; - # Retrieve $x, $y, ... - my @parameters = @$arrayref[1..$#$arrayref]; - ... + my ($arrayref, $filename) = @_; + # Retrieve $x, $y, ... + my @parameters = @$arrayref[1..$#$arrayref]; + ... } If the hook is an object, it must provide an INC method that will be @@ -4670,14 +4994,14 @@ into package C
.) Here is a typical code layout: package Foo; sub new { ... } sub Foo::INC { - my ($self, $filename) = @_; - ... + my ($self, $filename) = @_; + ... } # In the main program - push @INC, new Foo(...); + push @INC, Foo->new(...); -Note that these hooks are also permitted to set the %INC entry +These hooks are also permitted to set the %INC entry corresponding to the files they have loaded. See L. For a yet-more-powerful import facility, see L and L. @@ -4692,17 +5016,17 @@ variables and reset C searches so that they work again. The expression is interpreted as a list of single characters (hyphens allowed for ranges). All variables and arrays beginning with one of those letters are reset to their pristine state. If the expression is -omitted, one-match searches (C) are reset to match again. Resets -only variables or searches in the current package. Always returns +omitted, one-match searches (C) are reset to match again. +Only resets variables or searches in the current package. Always returns 1. Examples: - reset 'X'; # reset all X variables - reset 'a-z'; # reset lower case variables - reset; # just reset ?one-time? searches + reset 'X'; # reset all X variables + reset 'a-z'; # reset lower case variables + reset; # just reset ?one-time? searches Resetting C<"A-Z"> is not recommended because you'll wipe out your C<@ARGV> and C<@INC> arrays and your C<%ENV> hash. Resets only package -variables--lexical variables are unaffected, but they clean themselves +variables; lexical variables are unaffected, but they clean themselves up on scope exit anyway, so you'll probably want to use them instead. See L. @@ -4716,10 +5040,10 @@ given in EXPR. Evaluation of EXPR may be in list, scalar, or void context, depending on how the return value will be used, and the context may vary from one execution to the next (see C). If no EXPR is given, returns an empty list in list context, the undefined value in -scalar context, and (of course) nothing at all in a void context. +scalar context, and (of course) nothing at all in void context. -(Note that in the absence of an explicit C, a subroutine, eval, -or do FILE will automatically return the value of the last expression +(In the absence of an explicit C, a subroutine, eval, +or do FILE automatically returns the value of the last expression evaluated.) =item reverse LIST @@ -4740,13 +5064,17 @@ Used without arguments in scalar context, reverse() reverses C<$_>. print reverse; # No output, list context print scalar reverse; # Hello, world +Note that reversing an array to itself (as in C<@a = reverse @a>) will +preserve non-existent elements whenever possible, i.e., for non magical +arrays or tied arrays with C and C methods. + This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those can be represented as a key in the inverted hash. Also, this has to unwind one hash and build a whole new one, which may take some time on a large hash, such as from a DBM file. - %by_name = reverse %by_address; # Invert the hash + %by_name = reverse %by_address; # Invert the hash =item rewinddir DIRHANDLE X @@ -4772,7 +5100,7 @@ Deletes the directory specified by FILENAME if that directory is empty. If it succeeds it returns true, otherwise it returns false and sets C<$!> (errno). If FILENAME is omitted, uses C<$_>. -To remove a directory tree recursively (C on unix) look at +To remove a directory tree recursively (C on Unix) look at the C function of the L module. =item s/// @@ -4790,7 +5118,7 @@ Just like C, but implicitly appends a newline. C is simply an abbreviation for C<{ local $\ = "\n"; print LIST }>. -This keyword is only available when the "say" feature is +This keyword is available only when the "say" feature is enabled: see L. =item scalar EXPR @@ -4807,19 +5135,19 @@ needed. If you really wanted to do so, however, you could use the construction C<@{[ (some expression) ]}>, but usually a simple C<(some expression)> suffices. -Because C is unary operator, if you accidentally use for EXPR a +Because C is a unary operator, if you accidentally use for EXPR a parenthesized list, this behaves as a scalar comma expression, evaluating all but the last element in void context and returning the final element evaluated in scalar context. This is seldom what you want. The following single statement: - print uc(scalar(&foo,$bar)),$baz; + print uc(scalar(&foo,$bar)),$baz; is the moral equivalent of these two: - &foo; - print(uc($bar),$baz); + &foo; + print(uc($bar),$baz); See L for more details on unary operators and the comma operator. @@ -4833,7 +5161,7 @@ I to POSITION, C<1> to set it to the current position plus POSITION, and C<2> to set it to EOF plus POSITION (typically negative). For WHENCE you may use the constants C, C, and C (start of the file, current position, end -of the file) from the Fcntl module. Returns C<1> upon success, C<0> +of the file) from the Fcntl module. Returns C<1> on success, C<0> otherwise. Note the I: even if the filehandle has been set to @@ -4841,8 +5169,8 @@ operate on characters (for example by using the C<:encoding(utf8)> open layer), tell() will return byte offsets, not character offsets (because implementing that would render seek() and tell() rather slow). -If you want to position file for C or C, don't use -C--buffering makes its effect on the file's system position +If you want to position the file for C or C, don't use +C, because buffering makes its effect on the file's read-write position unpredictable and non-portable. Use C instead. Due to the rules and rigors of ANSI C, on some systems you have to do a @@ -4853,21 +5181,21 @@ A WHENCE of C<1> (C) is useful for not moving the file position: seek(TEST,0,1); This is also useful for applications emulating C. Once you hit -EOF on your read, and then sleep for a while, you might have to stick in a -seek() to reset things. The C doesn't change the current position, +EOF on your read and then sleep for a while, you (probably) have to stick in a +dummy seek() to reset things. The C doesn't change the position, but it I clear the end-of-file condition on the handle, so that the -next C<< >> makes Perl try again to read something. We hope. +next C<< >> makes Perl try again to read something. (We hope.) -If that doesn't work (some IO implementations are particularly -cantankerous), then you may need something more like this: +If that doesn't work (some I/O implementations are particularly +cantankerous), you might need something like this: for (;;) { - for ($curpos = tell(FILE); $_ = ; + for ($curpos = tell(FILE); $_ = ; $curpos = tell(FILE)) { - # search for some stuff and put it into files - } - sleep($for_a_while); - seek(FILE, $curpos, 0); + # search for some stuff and put it into files + } + sleep($for_a_while); + seek(FILE, $curpos, 0); } =item seekdir DIRHANDLE,POS @@ -4910,7 +5238,7 @@ methods, preferring to write the last example as: =item select RBITS,WBITS,EBITS,TIMEOUT X gets restarted after signals (say, SIGALRM) is implementation-dependent. See also L for notes on the portability of C behaves like the select(2) system call : it returns +On error, C

the pack function will gobble up -that many values from the LIST. A C<*> for the repeat count means to -use however many items are left, except for C<@>, C, C, where it -is equivalent to C<0>, for <.> where it means relative to string start -and C, where it is equivalent to 1 (or 45, which is the same). -A numeric repeat count may optionally be enclosed in brackets, as in -C. - -One can replace the numeric repeat count by a template enclosed in brackets; -then the packed length of this template in bytes is used as a count. -For example, C skips a long (it skips the number of bytes in a long); -the template C<$t X[$t] $t> unpack()s twice what $t unpacks. -If the template in brackets contains alignment commands (such as C), -its packed length is calculated as if the start of the template has the maximal -possible alignment. - -When used with C, C<*> results in the addition of a trailing null -byte (so the packed result will be one longer than the byte C -of the item). +Each letter may optionally be followed by a number indicating the repeat +count. A numeric repeat count may optionally be enclosed in brackets, as +in C. The repeat count gobbles that many values from +the LIST when used with all format types other than C, C, C, C, +C, C, C, C<@>, C<.>, C, C, and C