This is a live mirror of the Perl 5 development currently hosted at
perl 3.0 patch #3 Patch #2 continued
[perl5.git] /
a687059c 1''' Beginning of part 4
03a14243 2''' $Header:,v 89/10/26 23:18:43 lwall Locked $
4''' $Log:,v $
5''' Revision 89/10/26 23:18:43 lwall
6''' patch1: documented the desirability of unnecessary parentheses
8''' Revision 3.0 89/10/18 15:21:55 lwall
9''' 3.0 baseline
11.Sh "Precedence"
12.I Perl
13operators have the following associativity and precedence:
16nonassoc\h'|1i'print printf exec system sort reverse
17\h'1.5i'chmod chown kill unlink utime die return
19right\h'|1i'= += \-= *= etc.
24left\h'|1i'| ^
26nonassoc\h'|1i'== != eq ne
27nonassoc\h'|1i'< > <= >= lt gt le ge
28nonassoc\h'|1i'chdir exit eval reset sleep rand umask
29nonassoc\h'|1i'\-r \-w \-x etc.
30left\h'|1i'<< >>
31left\h'|1i'+ \- .
32left\h'|1i'* / % x
33left\h'|1i'=~ !~
34right\h'|1i'! ~ and unary minus
36nonassoc\h'|1i'++ \-\|\-
40As mentioned earlier, if any list operator (print, etc.) or
41any unary operator (chdir, etc.)
42is followed by a left parenthesis as the next token on the same line,
43the operator and arguments within parentheses are taken to
44be of highest precedence, just like a normal function call.
48 chdir $foo || die; # (chdir $foo) || die
49 chdir($foo) || die; # (chdir $foo) || die
50 chdir ($foo) || die; # (chdir $foo) || die
51 chdir +($foo) || die; # (chdir $foo) || die
53but, because * is higher precedence than ||:
55 chdir $foo * 20; # chdir ($foo * 20)
56 chdir($foo) * 20; # (chdir $foo) * 20
57 chdir ($foo) * 20; # (chdir $foo) * 20
58 chdir +($foo) * 20; # chdir ($foo * 20)
60 rand 10 * 20; # rand (10 * 20)
61 rand(10) * 20; # (rand 10) * 20
62 rand (10) * 20; # (rand 10) * 20
63 rand +(10) * 20; # rand (10 * 20)
66In the absence of parentheses,
67the precedence of list operators such as print, sort or chmod is
68either very high or very low depending on whether you look at the left
69side of operator or the right side of it.
70For example, in
73 @ary = (1, 3, sort 4, 2);
74 print @ary; # prints 1324
77the commas on the right of the sort are evaluated before the sort, but
78the commas on the left are evaluated after.
79In other words, list operators tend to gobble up all the arguments that
80follow them, and then act like a simple term with regard to the preceding
82Note that you have to be careful with parens:
84 3
86 # These evaluate exit before doing the print:
87 print($foo, exit); # Obviously not what you want.
88 print $foo, exit; # Nor is this.
89 4
91 # These do the print before evaluating exit:
92 (print $foo), exit; # This is what you want.
93 print($foo), exit; # Or this.
94 print ($foo), exit; # Or even this.
96Also note that
98 print ($foo & 255) + 1, "\en";
101probably doesn't do what you expect at first glance.
102.Sh "Subroutines"
103A subroutine may be declared as follows:
106 sub NAME BLOCK
110Any arguments passed to the routine come in as array @_,
111that is ($_[0], $_[1], .\|.\|.).
112The array @_ is a local array, but its values are references to the
113actual scalar parameters.
114The return value of the subroutine is the value of the last expression
115evaluated, and can be either an array value or a scalar value.
116Alternately, a return statement may be used to specify the returned value and
117exit the subroutine.
118To create local variables see the
119.I local
122A subroutine is called using the
123.I do
124operator or the & operator.
126 12
130 sub MAX {
131 local($max) = pop(@_);
132 foreach $foo (@_) {
133 $max = $foo \|if \|$max < $foo;
134 }
135 $max;
136 }
138 .\|.\|.
139 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
140 21
144 # get a line, combining continuation lines
145 # that start with whitespace
146 sub get_line {
147 $thisline = $lookahead;
148 line: while ($lookahead = <STDIN>) {
149 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
150 $thisline \|.= \|$lookahead;
151 }
152 else {
153 last line;
154 }
155 }
156 $thisline;
157 }
159 $lookahead = <STDIN>; # get first line
160 while ($_ = do get_line(\|)) {
161 .\|.\|.
162 }
163 6
167Use array assignment to a local list to name your formal arguments:
169 sub maybeset {
170 local($key, $value) = @_;
171 $foo{$key} = $value unless $foo{$key};
172 }
175This also has the effect of turning call-by-reference into call-by-value,
176since the assignment copies the values.
178Subroutines may be called recursively.
179If a subroutine is called using the & form, the argument list is optional.
180If omitted, no @_ array is set up for the subroutine; the @_ array at the
181time of the call is visible to subroutine instead.
184 do foo(1,2,3); # pass three arguments
185 &foo(1,2,3); # the same
187 do foo(); # pass a null list
188 &foo(); # the same
189 &foo; # pass no arguments--more efficient
192.Sh "Passing By Reference"
193Sometimes you don't want to pass the value of an array to a subroutine but
194rather the name of it, so that the subroutine can modify the global copy
195of it rather than working with a local copy.
196In perl you can refer to all the objects of a particular name by prefixing
197the name with a star: *foo.
198When evaluated, it produces a scalar value that represents all the objects
199of that name.
200When assigned to within a local() operation, it causes the name mentioned
201to refer to whatever * value was assigned to it.
205 sub doubleary {
206 local(*someary) = @_;
207 foreach $elem (@someary) {
208 $elem *= 2;
209 }
210 }
211 do doubleary(*foo);
212 do doubleary(*bar);
215Assignment to *name is currently recommended only inside a local().
216You can actually assign to *name anywhere, but the previous referent of
217*name may be stranded forever.
218This may or may not bother you.
220Note that scalars are already passed by reference, so you can modify scalar
221arguments without using this mechanism by refering explicitly to the $_[nnn]
222in question.
223You can modify all the elements of an array by passing all the elements
224as scalars, but you have to use the * mechanism to push, pop or change the
225size of an array.
226The * mechanism will probably be more efficient in any case.
228Since a *name value contains unprintable binary data, if it is used as
229an argument in a print, or as a %s argument in a printf or sprintf, it
230then has the value '*name', just so it prints out pretty.
231.Sh "Regular Expressions"
232The patterns used in pattern matching are regular expressions such as
233those supplied in the Version 8 regexp routines.
234(In fact, the routines are derived from Henry Spencer's freely redistributable
235reimplementation of the V8 routines.)
236In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
237Word boundaries may be matched by \eb, and non-boundaries by \eB.
238A whitespace character is matched by \es, non-whitespace by \eS.
239A numeric character is matched by \ed, non-numeric by \eD.
240You may use \ew, \es and \ed within character classes.
241Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
242Within character classes \eb represents backspace rather than a word boundary.
243Alternatives may be separated by |.
244The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
245matches the digit'th substring, where digit can range from 1 to 9.
246(Outside of the pattern, always use $ instead of \e in front of the digit.
247The scope of $<digit> (and $\`, $& and $\')
248extends to the end of the enclosing BLOCK or eval string, or to
249the next pattern match with subexpressions.
250The \e<digit> notation sometimes works outside the current pattern, but should
251not be relied upon.)
252$+ returns whatever the last bracket match matched.
253$& returns the entire matched string.
254($0 normally returns the same thing, but don't depend on it.)
255$\` returns everything before the matched string.
256$\' returns everything after the matched string.
260 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
261 5
263 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
264 $hours = $1;
265 $minutes = $2;
266 $seconds = $3;
267 }
270By default, the ^ character matches only the beginning of the string,
271the $ character matches only at the end (or before the newline at the end)
273.I perl
274does certain optimizations with the assumption that the string contains
275only one line.
276You may, however, wish to treat a string as a multi-line buffer, such that
277the ^ will match after any newline within the string, and $ will match
278before any newline.
279At the cost of a little more overhead, you can do this by setting the variable
280$* to 1.
281Setting it back to 0 makes
282.I perl
283revert to its old behavior.
285To facilitate multi-line substitutions, the . character never matches a newline
286(even when $* is 0).
287In particular, the following leaves a newline on the $_ string:
290 $_ = <STDIN>;
291 s/.*(some_string).*/$1/;
293If the newline is unwanted, try one of
295 s/.*(some_string).*\en/$1/;
296 s/.*(some_string)[^\e000]*/$1/;
297 s/.*(some_string)(.|\en)*/$1/;
298 chop; s/.*(some_string).*/$1/;
299 /(some_string)/ && ($_ = $1);
302Any item of a regular expression may be followed with digits in curly brackets
303of the form {n,m}, where n gives the minimum number of times to match the item
304and m gives the maximum.
305The form {n} is equivalent to {n,n} and matches exactly n times.
306The form {n,} matches n or more times.
307(If a curly bracket occurs in any other context, it is treated as a regular
309The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
310to {0,1}.
311There is no limit to the size of n or m, but large numbers will chew up
312more memory.
314You will note that all backslashed metacharacters in
315.I perl
316are alphanumeric,
317such as \eb, \ew, \en.
318Unlike some other regular expression languages, there are no backslashed
319symbols that aren't alphanumeric.
320So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
321interpreted as a literal character, not a metacharacter.
322This makes it simple to quote a string that you want to use for a pattern
323but that you are afraid might contain metacharacters.
324Simply quote all the non-alphanumeric characters:
327 $pattern =~ s/(\eW)/\e\e$1/g;
330.Sh "Formats"
331Output record formats for use with the
332.I write
333operator may declared as follows:
335 3
337 format NAME =
339 .
342If name is omitted, format \*(L"STDOUT\*(R" is defined.
343FORMLIST consists of a sequence of lines, each of which may be of one of three
345.Ip 1. 4
346A comment.
347.Ip 2. 4
348A \*(L"picture\*(R" line giving the format for one output line.
349.Ip 3. 4
350An argument line supplying values to plug into a picture line.
352Picture lines are printed exactly as they look, except for certain fields
353that substitute values into the line.
354Each picture field starts with either @ or ^.
355The @ field (not to be confused with the array marker @) is the normal
356case; ^ fields are used
357to do rudimentary multi-line text block filling.
358The length of the field is supplied by padding out the field
359with multiple <, >, or | characters to specify, respectively, left justification,
360right justification, or centering.
361If any of the values supplied for these fields contains a newline, only
362the text up to the newline is printed.
363The special field @* can be used for printing multi-line values.
364It should appear by itself on a line.
366The values are specified on the following line, in the same order as
367the picture fields.
368The values should be separated by commas.
370Picture fields that begin with ^ rather than @ are treated specially.
371The value supplied must be a scalar variable name which contains a text
373.I Perl
374puts as much text as it can into the field, and then chops off the front
375of the string so that the next time the variable is referenced,
376more of the text can be printed.
377Normally you would use a sequence of fields in a vertical stack to print
378out a block of text.
379If you like, you can end the final field with .\|.\|., which will appear in the
380output if the text was too long to appear in its entirety.
381You can change which characters are legal to break on by changing the
382variable $: to a list of the desired characters.
384Since use of ^ fields can produce variable length records if the text to be
385formatted is short, you can suppress blank lines by putting the tilde (~)
386character anywhere in the line.
387(Normally you should put it in the front if possible, for visibility.)
388The tilde will be translated to a space upon output.
389If you put a second tilde contiguous to the first, the line will be repeated
390until all the fields on the line are exhausted.
391(If you use a field of the @ variety, the expression you supply had better
392not give the same value every time forever!)
396.lg 0
397.cs R 25
398.ft C
399 10
401# a report on the /etc/passwd file
402format top =
403\& Passwd File
404Name Login Office Uid Gid Home
407format STDOUT =
408@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
409$name, $login, $office,$uid,$gid, $home
411 29
413# a report from a bug report form
414format top =
415\& Bug Reports
416@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
417$system, $%, $date
420format STDOUT =
421Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
422\& $subject
423Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
424\& $index, $description
425Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
426\& $priority, $date, $description
427From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
428\& $from, $description
429Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
430\& $programmer, $description
431\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
432\& $description
433\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
434\& $description
435\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
436\& $description
437\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
438\& $description
439\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
440\& $description
443.ft R
444.cs R
447It is possible to intermix prints with writes on the same output channel,
448but you'll have to handle $\- (lines left on the page) yourself.
450If you are printing lots of fields that are usually blank, you should consider
451using the reset operator between records.
452Not only is it more efficient, but it can prevent the bug of adding another
453field and forgetting to zero it.
454.Sh "Interprocess Communication"
455The IPC facilities of perl are built on the Berkeley socket mechanism.
456If you don't have sockets, you can ignore this section.
457The calls have the same names as the corresponding system calls,
458but the arguments tend to differ, for two reasons.
459First, perl file handles work differently than C file descriptors.
460Second, perl already knows the length of its strings, so you don't need
461to pass that information.
462Here is a sample client (untested):
465 ($them,$port) = @ARGV;
466 $port = 2345 unless $port;
467 $them = 'localhost' unless $them;
469 $SIG{'INT'} = 'dokill';
470 sub dokill { kill 9,$child if $child; }
472 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
474 $sockaddr = 'S n a4 x8';
475 chop($hostname = `hostname`);
477 ($name, $aliases, $proto) = getprotobyname('tcp');
478 ($name, $aliases, $port) = getservbyname($port, 'tcp')
479 unless $port =~ /^\ed+$/;;
480 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
481 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
483 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
484 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
486 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
487 bind(S, $this) || die "bind: $!";
488 connect(S, $that) || die "connect: $!";
490 select(S); $| = 1; select(stdout);
492 if ($child = fork) {
493 while (<>) {
494 print S;
495 }
496 sleep 3;
497 do dokill();
498 }
499 else {
500 while (<S>) {
501 print;
502 }
503 }
506And here's a server:
509 ($port) = @ARGV;
510 $port = 2345 unless $port;
512 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
514 $sockaddr = 'S n a4 x8';
516 ($name, $aliases, $proto) = getprotobyname('tcp');
517 ($name, $aliases, $port) = getservbyname($port, 'tcp')
518 unless $port =~ /^\ed+$/;;
520 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
522 select(NS); $| = 1; select(stdout);
524 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
525 bind(S, $this) || die "bind: $!";
526 listen(S, 5) || die "connect: $!";
528 select(S); $| = 1; select(stdout);
530 for (;;) {
531 print "Listening again\en";
532 ($addr = accept(NS,S)) || die $!;
533 print "accept ok\en";
535 ($af,$port,$inetaddr) = unpack($pat,$addr);
536 @inetaddr = unpack('C4',$inetaddr);
537 print "$af $port @inetaddr\en";
539 while (<NS>) {
540 print;
541 print NS;
542 }
543 }
546.Sh "Predefined Names"
547The following names have special meaning to
548.IR perl .
549I could have used alphabetic symbols for some of these, but I didn't want
550to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
552You'll just have to suffer along with these silly symbols.
553Most of them have reasonable mnemonics, or analogues in one of the shells.
554.Ip $_ 8
555The default input and pattern-searching space.
556The following pairs are equivalent:
558 2
560 while (<>) {\|.\|.\|. # only equivalent in while!
561 while ($_ = <>) {\|.\|.\|.
562 2
564 /\|^Subject:/
565 $_ \|=~ \|/\|^Subject:/
566 2
568 y/a\-z/A\-Z/
569 $_ =~ y/a\-z/A\-Z/
570 2
572 chop
573 chop($_)
576(Mnemonic: underline is understood in certain operations.)
577.Ip $. 8
578The current input line number of the last filehandle that was read.
580Remember that only an explicit close on the filehandle resets the line number.
581Since <> never does an explicit close, line numbers increase across ARGV files
582(but see examples under eof).
583(Mnemonic: many programs use . to mean the current line number.)
584.Ip $/ 8
585The input record separator, newline by default.
586Works like
587.IR awk 's
588RS variable, including treating blank lines as delimiters
589if set to the null string.
590If set to a value longer than one character, only the first character is used.
591(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
592.Ip $, 8
593The output field separator for the print operator.
594Ordinarily the print operator simply prints out the comma separated fields
595you specify.
596In order to get behavior more like
597.IR awk ,
598set this variable as you would set
599.IR awk 's
600OFS variable to specify what is printed between fields.
601(Mnemonic: what is printed when there is a , in your print statement.)
602.Ip $"" 8
603This is like $, except that it applies to array values interpolated into
604a double-quoted string (or similar interpreted string).
605Default is a space.
606(Mnemonic: obvious, I think.)
607.Ip $\e 8
608The output record separator for the print operator.
609Ordinarily the print operator simply prints out the comma separated fields
610you specify, with no trailing newline or record separator assumed.
611In order to get behavior more like
612.IR awk ,
613set this variable as you would set
614.IR awk 's
615ORS variable to specify what is printed at the end of the print.
616(Mnemonic: you set $\e instead of adding \en at the end of the print.
617Also, it's just like /, but it's what you get \*(L"back\*(R" from
618.IR perl .)
619.Ip $# 8
620The output format for printed numbers.
621This variable is a half-hearted attempt to emulate
622.IR awk 's
623OFMT variable.
624There are times, however, when
625.I awk
627.I perl
628have differing notions of what
629is in fact numeric.
630Also, the initial value is %.20g rather than %.6g, so you need to set $#
631explicitly to get
632.IR awk 's
634(Mnemonic: # is the number sign.)
635.Ip $% 8
636The current page number of the currently selected output channel.
637(Mnemonic: % is page number in nroff.)
638.Ip $= 8
639The current page length (printable lines) of the currently selected output
641Default is 60.
642(Mnemonic: = has horizontal lines.)
643.Ip $\- 8
644The number of lines left on the page of the currently selected output channel.
645(Mnemonic: lines_on_page \- lines_printed.)
646.Ip $~ 8
647The name of the current report format for the currently selected output
649(Mnemonic: brother to $^.)
650.Ip $^ 8
651The name of the current top-of-page format for the currently selected output
653(Mnemonic: points to top of page.)
654.Ip $| 8
655If set to nonzero, forces a flush after every write or print on the currently
656selected output channel.
657Default is 0.
658Note that
660will typically be line buffered if output is to the
661terminal and block buffered otherwise.
662Setting this variable is useful primarily when you are outputting to a pipe,
663such as when you are running a
664.I perl
665script under rsh and want to see the
666output as it's happening.
667(Mnemonic: when you want your pipes to be piping hot.)
668.Ip $$ 8
669The process number of the
670.I perl
671running this script.
672(Mnemonic: same as shells.)
673.Ip $? 8
674The status returned by the last pipe close, backtick (\`\`) command or
675.I system
677Note that this is the status word returned by the wait() system
678call, so the exit value of the subprocess is actually ($? >> 8).
679$? & 255 gives which signal, if any, the process died from, and whether
680there was a core dump.
681(Mnemonic: similar to sh and ksh.)
682.Ip $& 8 4
683The string matched by the last pattern match (not counting any matches hidden
684within a BLOCK or eval enclosed by the current BLOCK).
685(Mnemonic: like & in some editors.)
686.Ip $\` 8 4
687The string preceding whatever was matched by the last pattern match
688(not counting any matches hidden within a BLOCK or eval enclosed by the current
690(Mnemonic: \` often precedes a quoted string.)
691.Ip $\' 8 4
692The string following whatever was matched by the last pattern match
693(not counting any matches hidden within a BLOCK or eval enclosed by the current
695(Mnemonic: \' often follows a quoted string.)
698 3
700 $_ = \'abcdefghi\';
701 /def/;
702 print "$\`:$&:$\'\en"; # prints abc:def:ghi
705.Ip $+ 8 4
706The last bracket matched by the last search pattern.
707This is useful if you don't know which of a set of alternative patterns
709For example:
712 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
715(Mnemonic: be positive and forward looking.)
716.Ip $* 8 2
717Set to 1 to do multiline matching within a string, 0 to tell
718.I perl
719that it can assume that strings contain a single line, for the purpose
720of optimizing pattern matches.
721Pattern matches on strings containing multiple newlines can produce confusing
722results when $* is 0.
723Default is 0.
724(Mnemonic: * matches multiple things.)
725.Ip $0 8
726Contains the name of the file containing the
727.I perl
728script being executed.
729The value should be copied elsewhere before any pattern matching happens, which
730clobbers $0.
731(Mnemonic: same as sh and ksh.)
732.Ip $<digit> 8
733Contains the subpattern from the corresponding set of parentheses in the last
734pattern matched, not counting patterns matched in nested blocks that have
735been exited already.
736(Mnemonic: like \edigit.)
737.Ip $[ 8 2
738The index of the first element in an array, and of the first character in
739a substring.
740Default is 0, but you could set it to 1 to make
741.I perl
742behave more like
743.I awk
744(or Fortran)
745when subscripting and when evaluating the index() and substr() functions.
746(Mnemonic: [ begins subscripts.)
747.Ip $] 8 2
748The string printed out when you say \*(L"perl -v\*(R".
749It can be used to determine at the beginning of a script whether the perl
750interpreter executing the script is in the right range of versions.
753 5
755 # see if getc is available
756 ($version,$patchlevel) =
757 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
758 print STDERR "(No filename completion available.)\en"
759 if $version * 1000 + $patchlevel < 2016;
762(Mnemonic: Is this version of perl in the right bracket?)
763.Ip $; 8 2
764The subscript separator for multi-dimensional array emulation.
765If you refer to an associative array element as
767 $foo{$a,$b,$c}
769it really means
771 $foo{join($;, $a, $b, $c)}
773But don't put
775 @foo{$a,$b,$c} # a slice--note the @
777which means
779 ($foo{$a},$foo{$b},$foo{$c})
782Default is "\e034", the same as SUBSEP in
783.IR awk .
784Note that if your keys contain binary data there might not be any safe
785value for $;.
786(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
787Yeah, I know, it's pretty lame, but $, is already taken for something more
789.Ip $! 8 2
790If used in a numeric context, yields the current value of errno, with all the
791usual caveats.
792If used in a string context, yields the corresponding system error string.
793You can assign to $! in order to set errno
794if, for instance, you want $! to return the string for error n, or you want
795to set the exit value for the die operator.
796(Mnemonic: What just went bang?)
797.Ip $@ 8 2
798The error message from the last eval command.
799If null, the last eval parsed and executed correctly.
800(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
801.Ip $< 8 2
802The real uid of this process.
803(Mnemonic: it's the uid you came FROM, if you're running setuid.)
804.Ip $> 8 2
805The effective uid of this process.
808 2
810 $< = $>; # set real uid to the effective uid
811 ($<,$>) = ($>,$<); # swap real and effective uid
814(Mnemonic: it's the uid you went TO, if you're running setuid.)
815Note: $< and $> can only be swapped on machines supporting setreuid().
816.Ip $( 8 2
817The real gid of this process.
818If you are on a machine that supports membership in multiple groups
819simultaneously, gives a space separated list of groups you are in.
820The first number is the one returned by getgid(), and the subsequent ones
821by getgroups(), one of which may be the same as the first number.
822(Mnemonic: parentheses are used to GROUP things.
823The real gid is the group you LEFT, if you're running setgid.)
824.Ip $) 8 2
825The effective gid of this process.
826If you are on a machine that supports membership in multiple groups
827simultaneously, gives a space separated list of groups you are in.
828The first number is the one returned by getegid(), and the subsequent ones
829by getgroups(), one of which may be the same as the first number.
830(Mnemonic: parentheses are used to GROUP things.
831The effective gid is the group that's RIGHT for you, if you're running setgid.)
833Note: $<, $>, $( and $) can only be set on machines that support the
834corresponding set[re][ug]id() routine.
835$( and $) can only be swapped on machines supporting setregid().
836.Ip $: 8 2
837The current set of characters after which a string may be broken to
838fill continuation fields (starting with ^) in a format.
839Default is "\ \en-", to break on whitespace or hyphens.
840(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
841.Ip @ARGV 8 3
842The array ARGV contains the command line arguments intended for the script.
843Note that $#ARGV is the generally number of arguments minus one, since
844$ARGV[0] is the first argument, NOT the command name.
845See $0 for the command name.
846.Ip @INC 8 3
847The array INC contains the list of places to look for
848.I perl
849scripts to be
850evaluated by the \*(L"do EXPR\*(R" command.
851It initially consists of the arguments to any
852.B \-I
853command line switches, followed
854by the default
855.I perl
856library, probably \*(L"/usr/local/lib/perl\*(R".
857.Ip $ENV{expr} 8 2
858The associative array ENV contains your current environment.
859Setting a value in ENV changes the environment for child processes.
860.Ip $SIG{expr} 8 2
861The associative array SIG is used to set signal handlers for various signals.
864 12
866 sub handler { # 1st argument is signal name
867 local($sig) = @_;
868 print "Caught a SIG$sig\-\|\-shutting down\en";
869 close(LOG);
870 exit(0);
871 }
873 $SIG{\'INT\'} = \'handler\';
874 $SIG{\'QUIT\'} = \'handler\';
875 .\|.\|.
876 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
877 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
880The SIG array only contains values for the signals actually set within
881the perl script.
882.Sh "Packages"
883Perl provides a mechanism for alternate namespaces to protect packages from
884stomping on each others variables.
885By default, a perl script starts compiling into the package known as \*(L"main\*(R".
886By use of the
887.I package
888declaration, you can switch namespaces.
889The scope of the package declaration is from the declaration itself to the end
890of the enclosing block (the same scope as the local() operator).
891Typically it would be the first declaration in a file to be included by
892the \*(L"do FILE\*(R" operator.
893You can switch into a package in more than one place; it merely influences
894which symbol table is used by the compiler for the rest of that block.
895You can refer to variables in other packages by prefixing the name with
896the package name and a single quote.
897If the package name is null, the \*(L"main\*(R" package as assumed.
898Eval'ed strings are compiled in the package in which the eval was compiled
900(Assignments to $SIG{}, however, assume the signal handler specified is in the
901main package.
902Qualify the signal handler name if you wish to have a signal handler in
903a package.)
904For an example, examine in the perl library.
905It initially switches to the DB package so that the debugger doesn't interfere
906with variables in the script you are trying to debug.
907At various points, however, it temporarily switches back to the main package
908to evaluate various expressions in the context of the main package.
910The symbol table for a package happens to be stored in the associative array
911of that name prepended with an underscore.
912The value in each entry of the associative array is
913what you are referring to when you use the *name notation.
914In fact, the following have the same effect (in package main, anyway),
915though the first is more
916efficient because it does the symbol table lookups at compile time:
918 2
920 local(*foo) = *bar;
921 local($_main{'foo'}) = $_main{'bar'};
924You can use this to print out all the variables in a package, for instance.
925Here is from the perl library: 11
928 package dumpvar;
930 sub main'dumpvar {
931 \& ($package) = @_;
932 \& local(*stab) = eval("*_$package");
933 \& while (($key,$val) = each(%stab)) {
934 \& {
935 \& local(*entry) = $val;
936 \& if (defined $entry) {
937 \& print "\e$$key = '$entry'\en";
938 \& } 7
940 \& if (defined @entry) {
941 \& print "\e@$key = (\en";
942 \& foreach $num ($[ .. $#entry) {
943 \& print " $num\et'",$entry[$num],"'\en";
944 \& }
945 \& print ")\en";
946 \& } 10
948 \& if ($key ne "_$package" && defined %entry) {
949 \& print "\e%$key = (\en";
950 \& foreach $key (sort keys(%entry)) {
951 \& print " $key\et'",$entry{$key},"'\en";
952 \& }
953 \& print ")\en";
954 \& }
955 \& }
956 \& }
957 }
960Note that, even though the subroutine is compiled in package dumpvar, the
961name of the subroutine is qualified so that it's name is inserted into package
963.Sh "Style"
964Each programmer will, of course, have his or her own preferences in regards
965to formatting, but there are some general guidelines that will make your
966programs easier to read.
967.Ip 1. 4 4
968Just because you CAN do something a particular way doesn't mean that
969you SHOULD do it that way.
970.I Perl
971is designed to give you several ways to do anything, so consider picking
972the most readable one.
973For instance
975 open(FOO,$foo) || die "Can't open $foo: $!";
977is better than
979 die "Can't open $foo: $!" unless open(FOO,$foo);
981because the second way hides the main point of the statement in a
983On the other hand
985 print "Starting analysis\en" if $verbose;
987is better than
989 $verbose && print "Starting analysis\en";
991since the main point isn't whether the user typed -v or not.
993Similarly, just because an operator lets you assume default arguments
994doesn't mean that you have to make use of the defaults.
995The defaults are there for lazy systems programmers writing one-shot
997If you want your program to be readable, consider supplying the argument.
999Along the same lines, just because you
1000.I can
1001omit parentheses in many places doesn't mean that you ought to:
1004 return print reverse sort num values array;
1005 return print(reverse(sort num (values(%array))));
1008When in doubt, parenthesize.
1009At the very least it will let some poor schmuck bounce on the % key in vi.
1010.Ip 2. 4 4
1011Don't go through silly contortions to exit a loop at the top or the
1012bottom, when
1013.I perl
1014provides the "last" operator so you can exit in the middle.
1015Just outdent it a little to make it more visible:
1017 7
1019 line:
1020 for (;;) {
1021 statements;
1022 last line if $foo;
1023 next line if /^#/;
1024 statements;
1025 }
1028.Ip 3. 4 4
1029Don't be afraid to use loop labels\*(--they're there to enhance readability as
1030well as to allow multi-level loop breaks.
1031See last example.
1032.Ip 6. 4 4
1033For portability, when using features that may not be implemented on every
1034machine, test the construct in an eval to see if it fails.
1035If you know what version or patchlevel a particular feature was implemented,
1036you can test $] to see if it will be there.
1037.Ip 4. 4 4
1038Choose mnemonic indentifiers.
1039.Ip 5. 4 4
1040Be consistent.
1041.Sh "Debugging"
1042If you invoke
1043.I perl
1044with a
1045.B \-d
1046switch, your script will be run under a debugging monitor.
1047It will halt before the first executable statement and ask you for a
1048command, such as:
1049.Ip "h" 12 4
1050Prints out a help message.
1051.Ip "s" 12 4
1052Single step.
1053Executes until it reaches the beginning of another statement.
1054.Ip "c" 12 4
1056Executes until the next breakpoint is reached.
1057.Ip "<CR>" 12 4
1058Repeat last s or c.
1059.Ip "n" 12 4
1060Single step around subroutine call.
1061.Ip "l min+incr" 12 4
1062List incr+1 lines starting at min.
1063If min is omitted, starts where last listing left off.
1064If incr is omitted, previous value of incr is used.
1065.Ip "l min-max" 12 4
1066List lines in the indicated range.
1067.Ip "l line" 12 4
1068List just the indicated line.
1069.Ip "l" 12 4
1070List incr+1 more lines after last printed line.
1071.Ip "l subname" 12 4
1072List subroutine.
1073If it's a long subroutine it just lists the beginning.
1074Use \*(L"l\*(R" to list more.
1075.Ip "L" 12 4
1076List lines that have breakpoints or actions.
1077.Ip "t" 12 4
1078Toggle trace mode on or off.
1079.Ip "b line" 12 4
1080Set a breakpoint.
1081If line is omitted, sets a breakpoint on the current line
1082line that is about to be executed.
1083Breakpoints may only be set on lines that begin an executable statement.
1084.Ip "b subname" 12 4
1085Set breakpoint at first executable line of subroutine.
1086.Ip "S" 12 4
1087Lists the names of all subroutines.
1088.Ip "d line" 12 4
1089Delete breakpoint.
1090If line is omitted, deletes the breakpoint on the current line
1091line that is about to be executed.
1092.Ip "D" 12 4
1093Delete all breakpoints.
1094.Ip "A" 12 4
1095Delete all line actions.
1096.Ip "V package" 12 4
1097List all variables in package.
1098Default is main package.
1099.Ip "a line command" 12 4
1100Set an action for line.
1101A multi-line command may be entered by backslashing the newlines.
1102.Ip "< command" 12 4
1103Set an action to happen before every debugger prompt.
1104A multi-line command may be entered by backslashing the newlines.
1105.Ip "> command" 12 4
1106Set an action to happen after the prompt when you've just given a command
1107to return to executing the script.
1108A multi-line command may be entered by backslashing the newlines.
1109.Ip "! number" 12 4
1110Redo a debugging command.
1111If number is omitted, redoes the previous command.
1112.Ip "! -number" 12 4
1113Redo the command that was that many commands ago.
1114.Ip "H -number" 12 4
1115Display last n commands.
1116Only commands longer than one character are listed.
1117If number is omitted, lists them all.
1118.Ip "q or ^D" 12 4
1120.Ip "command" 12 4
1121Execute command as a perl statement.
1122A missing semicolon will be supplied.
1123.Ip "p expr" 12 4
1124Same as \*(L"print DB'OUT expr\*(R".
1125The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1126may be redirected to.
1128If you want to modify the debugger, copy from the perl library
1129to your current directory and modify it as necessary.
1130You can do some customization by setting up a .perldb file which contains
1131initialization code.
1132For instance, you could make aliases like these:
1135 $DBalias{'len'} = 's/^len(.*)/p length(\e$1)/';
1136 $DBalias{'stop'} = 's/^stop (at|in)/b/';
1137 $DBalias{'.'} =
1138 's/^./p "\e$DBsub(\e$DBline):\et\e$DBline[\e$DBline]"/';
1141.Sh "Setuid Scripts"
1142.I Perl
1143is designed to make it easy to write secure setuid and setgid scripts.
1144Unlike shells, which are based on multiple substitution passes on each line
1145of the script,
1146.I perl
1147uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1148Additionally, since the language has more built-in functionality, it
1149has to rely less upon external (and possibly untrustworthy) programs to
1150accomplish its purposes.
1152In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1153insecure, but this kernel feature can be disabled.
1154If it is,
1155.I perl
1156can emulate the setuid and setgid mechanism when it notices the otherwise
1157useless setuid/gid bits on perl scripts.
1158If the kernel feature isn't disabled,
1159.I perl
1160will complain loudly that your setuid script is insecure.
1161You'll need to either disable the kernel setuid script feature, or put
1162a C wrapper around the script.
1164When perl is executing a setuid script, it takes special precautions to
1165prevent you from falling into any obvious traps.
1166(In some ways, a perl script is more secure than the corresponding
1167C program.)
1168Any command line argument, environment variable, or input is marked as
1169\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1170command that invokes a subshell, or in any command that modifies files,
1171directories or processes.
1172Any variable that is set within an expression that has previously referenced
1173a tainted value also becomes tainted (even if it is logically impossible
1174for the tainted value to influence the variable).
1175For example:
1177 5
1179 $foo = shift; # $foo is tainted
1180 $bar = $foo,\'bar\'; # $bar is also tainted
1181 $xxx = <>; # Tainted
1182 $path = $ENV{\'PATH\'}; # Tainted, but see below
1183 $abc = \'abc\'; # Not tainted
1184 4
1186 system "echo $foo"; # Insecure
1187 system "echo", $foo; # Secure (doesn't use sh)
1188 system "echo $bar"; # Insecure
1189 system "echo $abc"; # Insecure until PATH set
1190 5
1192 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1193 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1195 $path = $ENV{\'PATH\'}; # Not tainted
1196 system "echo $abc"; # Is secure now!
1197 5
1199 open(FOO,"$foo"); # OK
1200 open(FOO,">$foo"); # Not OK
1202 open(FOO,"echo $foo|"); # Not OK, but...
1203 open(FOO,"-|") || exec \'echo\', $foo; # OK
1205 $zzz = `echo $foo`; # Insecure, zzz tainted
1207 unlink $abc,$foo; # Insecure
1208 umask $foo; # Insecure
1209 3
1211 exec "echo $foo"; # Insecure
1212 exec "echo", $foo; # Secure (doesn't use sh)
1213 exec "sh", \'-c\', $foo; # Considered secure, alas
1216The taintedness is associated with each scalar value, so some elements
1217of an array can be tainted, and others not.
1219If you try to do something insecure, you will get a fatal error saying
1220something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1221Note that you can still write an insecure system call or exec,
1222but only by explicity doing something like the last example above.
1223You can also bypass the tainting mechanism by referencing
1225.I perl
1226presumes that if you reference a substring using $1, $2, etc, you knew
1227what you were doing when you wrote the pattern:
1230 $ARGV[0] =~ /^\-P(\ew+)$/;
1231 $printer = $1; # Not tainted
1234This is fairly secure since \ew+ doesn't match shell metacharacters.
1235Use of .+ would have been insecure, but
1236.I perl
1237doesn't check for that, so you must be careful with your patterns.
1238This is the ONLY mechanism for untainting user supplied filenames if you
1239want to do file operations on them (unless you make $> equal to $<).
1241It's also possible to get into trouble with other operations that don't care
1242whether they use tainted values.
1243Make judicious use of the file tests in dealing with any user-supplied
1245When possible, do opens and such after setting $> = $<.
1246.I Perl
1247doesn't prevent you from opening tainted filenames for reading, so be
1248careful what you print out.
1249The tainting mechanism is intended to prevent stupid mistakes, not to remove
1250the need for thought.
1252.I Perl
1253uses PATH in executing subprocesses, and in finding the script if \-S
1254is used.
1255HOME or LOGDIR are used if chdir has no argument.
1257Apart from these,
1258.I perl
1259uses no environment variables, except to make them available
1260to the script being executed, and to child processes.
1261However, scripts running setuid would do well to execute the following lines
1262before doing anything else, just to keep people honest:
1264 3
1266 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1267 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1268 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1272Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1274/tmp/perl\-eXXXXXX temporary file for
1275.B \-e
1278a2p awk to perl translator
1280s2p sed to perl translator
1282Compilation errors will tell you the line number of the error, with an
1283indication of the next token or token type that was to be examined.
1284(In the case of a script passed to
1285.I perl
1287.B \-e
1288switches, each
1289.B \-e
1290is counted as one line.)
1292Setuid scripts have additional constraints that can produce error messages
1293such as \*(L"Insecure dependency\*(R".
1294See the section on setuid scripts.
1297.IR awk
1298users should take special note of the following:
1299.Ip * 4 2
1300Semicolons are required after all simple statements in
1301.IR perl .
1303is not a statement delimiter.
1304.Ip * 4 2
1305Curly brackets are required on ifs and whiles.
1306.Ip * 4 2
1307Variables begin with $ or @ in
1308.IR perl .
1309.Ip * 4 2
1310Arrays index from 0 unless you set $[.
1311Likewise string positions in substr() and index().
1312.Ip * 4 2
1313You have to decide whether your array has numeric or string indices.
1314.Ip * 4 2
1315Associative array values do not spring into existence upon mere reference.
1316.Ip * 4 2
1317You have to decide whether you want to use string or numeric comparisons.
1318.Ip * 4 2
1319Reading an input line does not split it for you. You get to split it yourself
1320to an array.
1321And the
1322.I split
1323operator has different arguments.
1324.Ip * 4 2
1325The current input line is normally in $_, not $0.
1326It generally does not have the newline stripped.
1327($0 is initially the name of the program executed, then the last matched
1329.Ip * 4 2
1330$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1331match pattern.
1332.Ip * 4 2
1334.I print
1335statement does not add field and record separators unless you set
1336$, and $\e.
1337.Ip * 4 2
1338You must open your files before you print to them.
1339.Ip * 4 2
1340The range operator is \*(L".\|.\*(R", not comma.
1341(The comma operator works as in C.)
1342.Ip * 4 2
1343The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1344(\*(L"~\*(R" is the one's complement operator, as in C.)
1345.Ip * 4 2
1346The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1347(\*(L"^\*(R" is the XOR operator, as in C.)
1348.Ip * 4 2
1349The concatenation operator is \*(L".\*(R", not the null string.
1350(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1351since the third slash would be interpreted as a division operator\*(--the
1352tokener is in fact slightly context sensitive for operators like /, ?, and <.
1353And in fact, . itself can be the beginning of a number.)
1354.Ip * 4 2
1355.IR Next ,
1356.I exit
1358.I continue
1359work differently.
1360.Ip * 4 2
1361The following variables work differently
1364 Awk \h'|2.5i'Perl
1365 ARGC \h'|2.5i'$#ARGV
1366 ARGV[0] \h'|2.5i'$0
1367 FILENAME\h'|2.5i'$ARGV
1368 FNR \h'|2.5i'$. \- something
1369 FS \h'|2.5i'(whatever you like)
1370 NF \h'|2.5i'$#Fld, or some such
1371 NR \h'|2.5i'$.
1372 OFMT \h'|2.5i'$#
1373 OFS \h'|2.5i'$,
1374 ORS \h'|2.5i'$\e
1375 RLENGTH \h'|2.5i'length($&)
1376 RS \h'|2.5i'$/
1377 RSTART \h'|2.5i'length($\`)
1378 SUBSEP \h'|2.5i'$;
1381.Ip * 4 2
1382When in doubt, run the
1383.I awk
1384construct through a2p and see what it gives you.
1386Cerebral C programmers should take note of the following:
1387.Ip * 4 2
1388Curly brackets are required on ifs and whiles.
1389.Ip * 4 2
1390You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1391.Ip * 4 2
1392.I Break
1394.I continue
1396.I last
1398.IR next ,
1400.Ip * 4 2
1401There's no switch statement.
1402.Ip * 4 2
1403Variables begin with $ or @ in
1404.IR perl .
1405.Ip * 4 2
1406Printf does not implement *.
1407.Ip * 4 2
1408Comments begin with #, not /*.
1409.Ip * 4 2
1410You can't take the address of anything.
1411.Ip * 4 2
1412ARGV must be capitalized.
1413.Ip * 4 2
1414The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1415.Ip * 4 2
1416Signal handlers deal with signal names, not numbers.
1417.Ip * 4 2
1418You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1421.I sed
1422programmers should take note of the following:
1423.Ip * 4 2
1424Backreferences in substitutions use $ rather than \e.
1425.Ip * 4 2
1426The pattern matching metacharacters (, ), and | do not have backslashes in front.
1427.Ip * 4 2
1428The range operator is .\|. rather than comma.
1430Sharp shell programmers should take note of the following:
1431.Ip * 4 2
1432The backtick operator does variable interpretation without regard to the
1433presence of single quotes in the command.
1434.Ip * 4 2
1435The backtick operator does no translation of the return value, unlike csh.
1436.Ip * 4 2
1437Shells (especially csh) do several levels of substitution on each command line.
1438.I Perl
1439does substitution only in certain constructs such as double quotes,
1440backticks, angle brackets and search patterns.
1441.Ip * 4 2
1442Shells interpret scripts a little bit at a time.
1443.I Perl
1444compiles the whole program before executing it.
1445.Ip * 4 2
1446The arguments are available via @ARGV, not $1, $2, etc.
1447.Ip * 4 2
1448The environment is not automatically made available as variables.
1449.SH BUGS
1451.I Perl
1452is at the mercy of your machine's definitions of various operations
1453such as type casting, atof() and sprintf().
1455If your stdio requires an seek or eof between reads and writes on a particular
1456stream, so does
1457.IR perl .
1459While none of the built-in data types have any arbitrary size limits (apart
1460from memory size), there are still a few arbitrary limits:
1461a given identifier may not be longer than 255 characters;
1462sprintf is limited on many machines to 128 characters per field (unless the format
1463specifier is exactly %s);
1464and no component of your PATH may be longer than 255 if you use \-S.
1466.I Perl
1467actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1468anyone I said that.
1469.rn }` ''