This is a live mirror of the Perl 5 development currently hosted at
perl 3.0 patch #4 Patch #2 continued
[perl5.git] /
a687059c 1''' Beginning of part 4
ae986130 2''' $Header:,v 89/11/11 04:46:40 lwall Locked $
4''' $Log:,v $
5''' Revision 89/11/11 04:46:40 lwall
6''' patch2: made some line breaks depend on troff vs. nroff
7''' patch2: clarified operation of ^ and $ when $* is false
9''' Revision 89/10/26 23:18:43 lwall
10''' patch1: documented the desirability of unnecessary parentheses
12''' Revision 3.0 89/10/18 15:21:55 lwall
13''' 3.0 baseline
15.Sh "Precedence"
16.I Perl
17operators have the following associativity and precedence:
20nonassoc\h'|1i'print printf exec system sort reverse
21\h'1.5i'chmod chown kill unlink utime die return
23right\h'|1i'= += \-= *= etc.
28left\h'|1i'| ^
30nonassoc\h'|1i'== != eq ne
31nonassoc\h'|1i'< > <= >= lt gt le ge
32nonassoc\h'|1i'chdir exit eval reset sleep rand umask
33nonassoc\h'|1i'\-r \-w \-x etc.
34left\h'|1i'<< >>
35left\h'|1i'+ \- .
36left\h'|1i'* / % x
37left\h'|1i'=~ !~
38right\h'|1i'! ~ and unary minus
40nonassoc\h'|1i'++ \-\|\-
44As mentioned earlier, if any list operator (print, etc.) or
45any unary operator (chdir, etc.)
46is followed by a left parenthesis as the next token on the same line,
47the operator and arguments within parentheses are taken to
48be of highest precedence, just like a normal function call.
52 chdir $foo || die; # (chdir $foo) || die
53 chdir($foo) || die; # (chdir $foo) || die
54 chdir ($foo) || die; # (chdir $foo) || die
55 chdir +($foo) || die; # (chdir $foo) || die
57but, because * is higher precedence than ||:
59 chdir $foo * 20; # chdir ($foo * 20)
60 chdir($foo) * 20; # (chdir $foo) * 20
61 chdir ($foo) * 20; # (chdir $foo) * 20
62 chdir +($foo) * 20; # chdir ($foo * 20)
64 rand 10 * 20; # rand (10 * 20)
65 rand(10) * 20; # (rand 10) * 20
66 rand (10) * 20; # (rand 10) * 20
67 rand +(10) * 20; # rand (10 * 20)
70In the absence of parentheses,
71the precedence of list operators such as print, sort or chmod is
72either very high or very low depending on whether you look at the left
73side of operator or the right side of it.
74For example, in
77 @ary = (1, 3, sort 4, 2);
78 print @ary; # prints 1324
81the commas on the right of the sort are evaluated before the sort, but
82the commas on the left are evaluated after.
83In other words, list operators tend to gobble up all the arguments that
84follow them, and then act like a simple term with regard to the preceding
86Note that you have to be careful with parens:
88 3
90 # These evaluate exit before doing the print:
91 print($foo, exit); # Obviously not what you want.
92 print $foo, exit; # Nor is this.
93 4
95 # These do the print before evaluating exit:
96 (print $foo), exit; # This is what you want.
97 print($foo), exit; # Or this.
98 print ($foo), exit; # Or even this.
100Also note that
102 print ($foo & 255) + 1, "\en";
105probably doesn't do what you expect at first glance.
106.Sh "Subroutines"
107A subroutine may be declared as follows:
110 sub NAME BLOCK
114Any arguments passed to the routine come in as array @_,
115that is ($_[0], $_[1], .\|.\|.).
116The array @_ is a local array, but its values are references to the
117actual scalar parameters.
118The return value of the subroutine is the value of the last expression
119evaluated, and can be either an array value or a scalar value.
120Alternately, a return statement may be used to specify the returned value and
121exit the subroutine.
122To create local variables see the
123.I local
126A subroutine is called using the
127.I do
128operator or the & operator.
130 12
134 sub MAX {
135 local($max) = pop(@_);
136 foreach $foo (@_) {
137 $max = $foo \|if \|$max < $foo;
138 }
139 $max;
140 }
142 .\|.\|.
143 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
144 21
148 # get a line, combining continuation lines
149 # that start with whitespace
150 sub get_line {
151 $thisline = $lookahead;
152 line: while ($lookahead = <STDIN>) {
153 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
154 $thisline \|.= \|$lookahead;
155 }
156 else {
157 last line;
158 }
159 }
160 $thisline;
161 }
163 $lookahead = <STDIN>; # get first line
164 while ($_ = do get_line(\|)) {
165 .\|.\|.
166 }
167 6
171Use array assignment to a local list to name your formal arguments:
173 sub maybeset {
174 local($key, $value) = @_;
175 $foo{$key} = $value unless $foo{$key};
176 }
179This also has the effect of turning call-by-reference into call-by-value,
180since the assignment copies the values.
182Subroutines may be called recursively.
183If a subroutine is called using the & form, the argument list is optional.
184If omitted, no @_ array is set up for the subroutine; the @_ array at the
185time of the call is visible to subroutine instead.
188 do foo(1,2,3); # pass three arguments
189 &foo(1,2,3); # the same
191 do foo(); # pass a null list
192 &foo(); # the same
193 &foo; # pass no arguments--more efficient
196.Sh "Passing By Reference"
197Sometimes you don't want to pass the value of an array to a subroutine but
198rather the name of it, so that the subroutine can modify the global copy
199of it rather than working with a local copy.
200In perl you can refer to all the objects of a particular name by prefixing
201the name with a star: *foo.
202When evaluated, it produces a scalar value that represents all the objects
203of that name.
204When assigned to within a local() operation, it causes the name mentioned
205to refer to whatever * value was assigned to it.
209 sub doubleary {
210 local(*someary) = @_;
211 foreach $elem (@someary) {
212 $elem *= 2;
213 }
214 }
215 do doubleary(*foo);
216 do doubleary(*bar);
219Assignment to *name is currently recommended only inside a local().
220You can actually assign to *name anywhere, but the previous referent of
221*name may be stranded forever.
222This may or may not bother you.
224Note that scalars are already passed by reference, so you can modify scalar
ae986130 225arguments without using this mechanism by referring explicitly to the $_[nnn]
226in question.
227You can modify all the elements of an array by passing all the elements
228as scalars, but you have to use the * mechanism to push, pop or change the
229size of an array.
230The * mechanism will probably be more efficient in any case.
232Since a *name value contains unprintable binary data, if it is used as
233an argument in a print, or as a %s argument in a printf or sprintf, it
234then has the value '*name', just so it prints out pretty.
235.Sh "Regular Expressions"
236The patterns used in pattern matching are regular expressions such as
237those supplied in the Version 8 regexp routines.
238(In fact, the routines are derived from Henry Spencer's freely redistributable
239reimplementation of the V8 routines.)
240In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
241Word boundaries may be matched by \eb, and non-boundaries by \eB.
242A whitespace character is matched by \es, non-whitespace by \eS.
243A numeric character is matched by \ed, non-numeric by \eD.
244You may use \ew, \es and \ed within character classes.
245Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
246Within character classes \eb represents backspace rather than a word boundary.
247Alternatives may be separated by |.
248The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
249matches the digit'th substring, where digit can range from 1 to 9.
250(Outside of the pattern, always use $ instead of \e in front of the digit.
251The scope of $<digit> (and $\`, $& and $\')
252extends to the end of the enclosing BLOCK or eval string, or to
253the next pattern match with subexpressions.
254The \e<digit> notation sometimes works outside the current pattern, but should
255not be relied upon.)
256$+ returns whatever the last bracket match matched.
257$& returns the entire matched string.
258($0 normally returns the same thing, but don't depend on it.)
259$\` returns everything before the matched string.
260$\' returns everything after the matched string.
264 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
265 5
267 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
268 $hours = $1;
269 $minutes = $2;
270 $seconds = $3;
271 }
274By default, the ^ character is only guaranteed to match at the beginning
275of the string,
276the $ character only at the end (or before the newline at the end)
278.I perl
279does certain optimizations with the assumption that the string contains
280only one line.
ae986130 281The behavior of ^ and $ on embedded newlines will be inconsistent.
282You may, however, wish to treat a string as a multi-line buffer, such that
283the ^ will match after any newline within the string, and $ will match
284before any newline.
285At the cost of a little more overhead, you can do this by setting the variable
286$* to 1.
287Setting it back to 0 makes
288.I perl
289revert to its old behavior.
291To facilitate multi-line substitutions, the . character never matches a newline
292(even when $* is 0).
293In particular, the following leaves a newline on the $_ string:
296 $_ = <STDIN>;
297 s/.*(some_string).*/$1/;
299If the newline is unwanted, try one of
301 s/.*(some_string).*\en/$1/;
302 s/.*(some_string)[^\e000]*/$1/;
303 s/.*(some_string)(.|\en)*/$1/;
304 chop; s/.*(some_string).*/$1/;
305 /(some_string)/ && ($_ = $1);
308Any item of a regular expression may be followed with digits in curly brackets
309of the form {n,m}, where n gives the minimum number of times to match the item
310and m gives the maximum.
311The form {n} is equivalent to {n,n} and matches exactly n times.
312The form {n,} matches n or more times.
313(If a curly bracket occurs in any other context, it is treated as a regular
315The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
316to {0,1}.
317There is no limit to the size of n or m, but large numbers will chew up
318more memory.
320You will note that all backslashed metacharacters in
321.I perl
322are alphanumeric,
323such as \eb, \ew, \en.
324Unlike some other regular expression languages, there are no backslashed
325symbols that aren't alphanumeric.
326So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
327interpreted as a literal character, not a metacharacter.
328This makes it simple to quote a string that you want to use for a pattern
329but that you are afraid might contain metacharacters.
330Simply quote all the non-alphanumeric characters:
333 $pattern =~ s/(\eW)/\e\e$1/g;
336.Sh "Formats"
337Output record formats for use with the
338.I write
339operator may declared as follows:
341 3
343 format NAME =
345 .
348If name is omitted, format \*(L"STDOUT\*(R" is defined.
349FORMLIST consists of a sequence of lines, each of which may be of one of three
351.Ip 1. 4
352A comment.
353.Ip 2. 4
354A \*(L"picture\*(R" line giving the format for one output line.
355.Ip 3. 4
356An argument line supplying values to plug into a picture line.
358Picture lines are printed exactly as they look, except for certain fields
359that substitute values into the line.
360Each picture field starts with either @ or ^.
361The @ field (not to be confused with the array marker @) is the normal
362case; ^ fields are used
363to do rudimentary multi-line text block filling.
364The length of the field is supplied by padding out the field
365with multiple <, >, or | characters to specify, respectively, left justification,
366right justification, or centering.
367If any of the values supplied for these fields contains a newline, only
368the text up to the newline is printed.
369The special field @* can be used for printing multi-line values.
370It should appear by itself on a line.
372The values are specified on the following line, in the same order as
373the picture fields.
374The values should be separated by commas.
376Picture fields that begin with ^ rather than @ are treated specially.
377The value supplied must be a scalar variable name which contains a text
379.I Perl
380puts as much text as it can into the field, and then chops off the front
381of the string so that the next time the variable is referenced,
382more of the text can be printed.
383Normally you would use a sequence of fields in a vertical stack to print
384out a block of text.
385If you like, you can end the final field with .\|.\|., which will appear in the
386output if the text was too long to appear in its entirety.
387You can change which characters are legal to break on by changing the
388variable $: to a list of the desired characters.
390Since use of ^ fields can produce variable length records if the text to be
391formatted is short, you can suppress blank lines by putting the tilde (~)
392character anywhere in the line.
393(Normally you should put it in the front if possible, for visibility.)
394The tilde will be translated to a space upon output.
395If you put a second tilde contiguous to the first, the line will be repeated
396until all the fields on the line are exhausted.
397(If you use a field of the @ variety, the expression you supply had better
398not give the same value every time forever!)
402.lg 0
403.cs R 25
404.ft C
405 10
407# a report on the /etc/passwd file
408format top =
409\& Passwd File
410Name Login Office Uid Gid Home
413format STDOUT =
414@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
415$name, $login, $office,$uid,$gid, $home
417 29
419# a report from a bug report form
420format top =
421\& Bug Reports
422@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
423$system, $%, $date
426format STDOUT =
427Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
428\& $subject
429Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
430\& $index, $description
431Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
432\& $priority, $date, $description
433From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
434\& $from, $description
435Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
436\& $programmer, $description
437\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
438\& $description
439\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
440\& $description
441\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
442\& $description
443\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
444\& $description
445\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
446\& $description
449.ft R
450.cs R
453It is possible to intermix prints with writes on the same output channel,
454but you'll have to handle $\- (lines left on the page) yourself.
456If you are printing lots of fields that are usually blank, you should consider
457using the reset operator between records.
458Not only is it more efficient, but it can prevent the bug of adding another
459field and forgetting to zero it.
460.Sh "Interprocess Communication"
461The IPC facilities of perl are built on the Berkeley socket mechanism.
462If you don't have sockets, you can ignore this section.
463The calls have the same names as the corresponding system calls,
464but the arguments tend to differ, for two reasons.
465First, perl file handles work differently than C file descriptors.
466Second, perl already knows the length of its strings, so you don't need
467to pass that information.
468Here is a sample client (untested):
471 ($them,$port) = @ARGV;
472 $port = 2345 unless $port;
473 $them = 'localhost' unless $them;
475 $SIG{'INT'} = 'dokill';
476 sub dokill { kill 9,$child if $child; }
478 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
480 $sockaddr = 'S n a4 x8';
481 chop($hostname = `hostname`);
483 ($name, $aliases, $proto) = getprotobyname('tcp');
484 ($name, $aliases, $port) = getservbyname($port, 'tcp')
485 unless $port =~ /^\ed+$/;;
ae986130 t \{\
a687059c 487 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
489.el \{\
490 ($name, $aliases, $type, $len, $thisaddr) =
491 gethostbyname($hostname);
493 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
495 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
496 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
498 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
499 bind(S, $this) || die "bind: $!";
500 connect(S, $that) || die "connect: $!";
502 select(S); $| = 1; select(stdout);
504 if ($child = fork) {
505 while (<>) {
506 print S;
507 }
508 sleep 3;
509 do dokill();
510 }
511 else {
512 while (<S>) {
513 print;
514 }
515 }
518And here's a server:
521 ($port) = @ARGV;
522 $port = 2345 unless $port;
524 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
526 $sockaddr = 'S n a4 x8';
528 ($name, $aliases, $proto) = getprotobyname('tcp');
529 ($name, $aliases, $port) = getservbyname($port, 'tcp')
530 unless $port =~ /^\ed+$/;;
532 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
534 select(NS); $| = 1; select(stdout);
536 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
537 bind(S, $this) || die "bind: $!";
538 listen(S, 5) || die "connect: $!";
540 select(S); $| = 1; select(stdout);
542 for (;;) {
543 print "Listening again\en";
544 ($addr = accept(NS,S)) || die $!;
545 print "accept ok\en";
ae986130 547 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
548 @inetaddr = unpack('C4',$inetaddr);
549 print "$af $port @inetaddr\en";
551 while (<NS>) {
552 print;
553 print NS;
554 }
555 }
558.Sh "Predefined Names"
559The following names have special meaning to
560.IR perl .
561I could have used alphabetic symbols for some of these, but I didn't want
562to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
564You'll just have to suffer along with these silly symbols.
565Most of them have reasonable mnemonics, or analogues in one of the shells.
566.Ip $_ 8
567The default input and pattern-searching space.
568The following pairs are equivalent:
570 2
572 while (<>) {\|.\|.\|. # only equivalent in while!
573 while ($_ = <>) {\|.\|.\|.
574 2
576 /\|^Subject:/
577 $_ \|=~ \|/\|^Subject:/
578 2
580 y/a\-z/A\-Z/
581 $_ =~ y/a\-z/A\-Z/
582 2
584 chop
585 chop($_)
588(Mnemonic: underline is understood in certain operations.)
589.Ip $. 8
590The current input line number of the last filehandle that was read.
592Remember that only an explicit close on the filehandle resets the line number.
593Since <> never does an explicit close, line numbers increase across ARGV files
594(but see examples under eof).
595(Mnemonic: many programs use . to mean the current line number.)
596.Ip $/ 8
597The input record separator, newline by default.
598Works like
599.IR awk 's
600RS variable, including treating blank lines as delimiters
601if set to the null string.
602If set to a value longer than one character, only the first character is used.
603(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
604.Ip $, 8
605The output field separator for the print operator.
606Ordinarily the print operator simply prints out the comma separated fields
607you specify.
608In order to get behavior more like
609.IR awk ,
610set this variable as you would set
611.IR awk 's
612OFS variable to specify what is printed between fields.
613(Mnemonic: what is printed when there is a , in your print statement.)
614.Ip $"" 8
615This is like $, except that it applies to array values interpolated into
616a double-quoted string (or similar interpreted string).
617Default is a space.
618(Mnemonic: obvious, I think.)
619.Ip $\e 8
620The output record separator for the print operator.
621Ordinarily the print operator simply prints out the comma separated fields
622you specify, with no trailing newline or record separator assumed.
623In order to get behavior more like
624.IR awk ,
625set this variable as you would set
626.IR awk 's
627ORS variable to specify what is printed at the end of the print.
628(Mnemonic: you set $\e instead of adding \en at the end of the print.
629Also, it's just like /, but it's what you get \*(L"back\*(R" from
630.IR perl .)
631.Ip $# 8
632The output format for printed numbers.
633This variable is a half-hearted attempt to emulate
634.IR awk 's
635OFMT variable.
636There are times, however, when
637.I awk
639.I perl
640have differing notions of what
641is in fact numeric.
642Also, the initial value is %.20g rather than %.6g, so you need to set $#
643explicitly to get
644.IR awk 's
646(Mnemonic: # is the number sign.)
647.Ip $% 8
648The current page number of the currently selected output channel.
649(Mnemonic: % is page number in nroff.)
650.Ip $= 8
651The current page length (printable lines) of the currently selected output
653Default is 60.
654(Mnemonic: = has horizontal lines.)
655.Ip $\- 8
656The number of lines left on the page of the currently selected output channel.
657(Mnemonic: lines_on_page \- lines_printed.)
658.Ip $~ 8
659The name of the current report format for the currently selected output
661(Mnemonic: brother to $^.)
662.Ip $^ 8
663The name of the current top-of-page format for the currently selected output
665(Mnemonic: points to top of page.)
666.Ip $| 8
667If set to nonzero, forces a flush after every write or print on the currently
668selected output channel.
669Default is 0.
670Note that
672will typically be line buffered if output is to the
673terminal and block buffered otherwise.
674Setting this variable is useful primarily when you are outputting to a pipe,
675such as when you are running a
676.I perl
677script under rsh and want to see the
678output as it's happening.
679(Mnemonic: when you want your pipes to be piping hot.)
680.Ip $$ 8
681The process number of the
682.I perl
683running this script.
684(Mnemonic: same as shells.)
685.Ip $? 8
686The status returned by the last pipe close, backtick (\`\`) command or
687.I system
689Note that this is the status word returned by the wait() system
690call, so the exit value of the subprocess is actually ($? >> 8).
691$? & 255 gives which signal, if any, the process died from, and whether
692there was a core dump.
693(Mnemonic: similar to sh and ksh.)
694.Ip $& 8 4
695The string matched by the last pattern match (not counting any matches hidden
696within a BLOCK or eval enclosed by the current BLOCK).
697(Mnemonic: like & in some editors.)
698.Ip $\` 8 4
699The string preceding whatever was matched by the last pattern match
700(not counting any matches hidden within a BLOCK or eval enclosed by the current
702(Mnemonic: \` often precedes a quoted string.)
703.Ip $\' 8 4
704The string following whatever was matched by the last pattern match
705(not counting any matches hidden within a BLOCK or eval enclosed by the current
707(Mnemonic: \' often follows a quoted string.)
710 3
712 $_ = \'abcdefghi\';
713 /def/;
714 print "$\`:$&:$\'\en"; # prints abc:def:ghi
717.Ip $+ 8 4
718The last bracket matched by the last search pattern.
719This is useful if you don't know which of a set of alternative patterns
721For example:
724 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
727(Mnemonic: be positive and forward looking.)
728.Ip $* 8 2
729Set to 1 to do multiline matching within a string, 0 to tell
730.I perl
731that it can assume that strings contain a single line, for the purpose
732of optimizing pattern matches.
733Pattern matches on strings containing multiple newlines can produce confusing
734results when $* is 0.
735Default is 0.
736(Mnemonic: * matches multiple things.)
737.Ip $0 8
738Contains the name of the file containing the
739.I perl
740script being executed.
741The value should be copied elsewhere before any pattern matching happens, which
742clobbers $0.
743(Mnemonic: same as sh and ksh.)
744.Ip $<digit> 8
745Contains the subpattern from the corresponding set of parentheses in the last
746pattern matched, not counting patterns matched in nested blocks that have
747been exited already.
748(Mnemonic: like \edigit.)
749.Ip $[ 8 2
750The index of the first element in an array, and of the first character in
751a substring.
752Default is 0, but you could set it to 1 to make
753.I perl
754behave more like
755.I awk
756(or Fortran)
757when subscripting and when evaluating the index() and substr() functions.
758(Mnemonic: [ begins subscripts.)
759.Ip $] 8 2
760The string printed out when you say \*(L"perl -v\*(R".
761It can be used to determine at the beginning of a script whether the perl
762interpreter executing the script is in the right range of versions.
765 5
767 # see if getc is available
768 ($version,$patchlevel) =
769 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
770 print STDERR "(No filename completion available.)\en"
771 if $version * 1000 + $patchlevel < 2016;
774(Mnemonic: Is this version of perl in the right bracket?)
775.Ip $; 8 2
776The subscript separator for multi-dimensional array emulation.
777If you refer to an associative array element as
779 $foo{$a,$b,$c}
781it really means
783 $foo{join($;, $a, $b, $c)}
785But don't put
787 @foo{$a,$b,$c} # a slice--note the @
789which means
791 ($foo{$a},$foo{$b},$foo{$c})
794Default is "\e034", the same as SUBSEP in
795.IR awk .
796Note that if your keys contain binary data there might not be any safe
797value for $;.
798(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
799Yeah, I know, it's pretty lame, but $, is already taken for something more
801.Ip $! 8 2
802If used in a numeric context, yields the current value of errno, with all the
803usual caveats.
804If used in a string context, yields the corresponding system error string.
805You can assign to $! in order to set errno
806if, for instance, you want $! to return the string for error n, or you want
807to set the exit value for the die operator.
808(Mnemonic: What just went bang?)
809.Ip $@ 8 2
810The error message from the last eval command.
811If null, the last eval parsed and executed correctly.
812(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
813.Ip $< 8 2
814The real uid of this process.
815(Mnemonic: it's the uid you came FROM, if you're running setuid.)
816.Ip $> 8 2
817The effective uid of this process.
820 2
822 $< = $>; # set real uid to the effective uid
823 ($<,$>) = ($>,$<); # swap real and effective uid
826(Mnemonic: it's the uid you went TO, if you're running setuid.)
827Note: $< and $> can only be swapped on machines supporting setreuid().
828.Ip $( 8 2
829The real gid of this process.
830If you are on a machine that supports membership in multiple groups
831simultaneously, gives a space separated list of groups you are in.
832The first number is the one returned by getgid(), and the subsequent ones
833by getgroups(), one of which may be the same as the first number.
834(Mnemonic: parentheses are used to GROUP things.
835The real gid is the group you LEFT, if you're running setgid.)
836.Ip $) 8 2
837The effective gid of this process.
838If you are on a machine that supports membership in multiple groups
839simultaneously, gives a space separated list of groups you are in.
840The first number is the one returned by getegid(), and the subsequent ones
841by getgroups(), one of which may be the same as the first number.
842(Mnemonic: parentheses are used to GROUP things.
843The effective gid is the group that's RIGHT for you, if you're running setgid.)
845Note: $<, $>, $( and $) can only be set on machines that support the
846corresponding set[re][ug]id() routine.
847$( and $) can only be swapped on machines supporting setregid().
848.Ip $: 8 2
849The current set of characters after which a string may be broken to
850fill continuation fields (starting with ^) in a format.
851Default is "\ \en-", to break on whitespace or hyphens.
852(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
853.Ip @ARGV 8 3
854The array ARGV contains the command line arguments intended for the script.
855Note that $#ARGV is the generally number of arguments minus one, since
856$ARGV[0] is the first argument, NOT the command name.
857See $0 for the command name.
858.Ip @INC 8 3
859The array INC contains the list of places to look for
860.I perl
861scripts to be
862evaluated by the \*(L"do EXPR\*(R" command.
863It initially consists of the arguments to any
864.B \-I
865command line switches, followed
866by the default
867.I perl
868library, probably \*(L"/usr/local/lib/perl\*(R".
869.Ip $ENV{expr} 8 2
870The associative array ENV contains your current environment.
871Setting a value in ENV changes the environment for child processes.
872.Ip $SIG{expr} 8 2
873The associative array SIG is used to set signal handlers for various signals.
876 12
878 sub handler { # 1st argument is signal name
879 local($sig) = @_;
880 print "Caught a SIG$sig\-\|\-shutting down\en";
881 close(LOG);
882 exit(0);
883 }
885 $SIG{\'INT\'} = \'handler\';
886 $SIG{\'QUIT\'} = \'handler\';
887 .\|.\|.
888 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
889 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
892The SIG array only contains values for the signals actually set within
893the perl script.
894.Sh "Packages"
895Perl provides a mechanism for alternate namespaces to protect packages from
896stomping on each others variables.
897By default, a perl script starts compiling into the package known as \*(L"main\*(R".
898By use of the
899.I package
900declaration, you can switch namespaces.
901The scope of the package declaration is from the declaration itself to the end
902of the enclosing block (the same scope as the local() operator).
903Typically it would be the first declaration in a file to be included by
904the \*(L"do FILE\*(R" operator.
905You can switch into a package in more than one place; it merely influences
906which symbol table is used by the compiler for the rest of that block.
907You can refer to variables in other packages by prefixing the name with
908the package name and a single quote.
909If the package name is null, the \*(L"main\*(R" package as assumed.
910Eval'ed strings are compiled in the package in which the eval was compiled
912(Assignments to $SIG{}, however, assume the signal handler specified is in the
913main package.
914Qualify the signal handler name if you wish to have a signal handler in
915a package.)
916For an example, examine in the perl library.
917It initially switches to the DB package so that the debugger doesn't interfere
918with variables in the script you are trying to debug.
919At various points, however, it temporarily switches back to the main package
920to evaluate various expressions in the context of the main package.
922The symbol table for a package happens to be stored in the associative array
923of that name prepended with an underscore.
924The value in each entry of the associative array is
925what you are referring to when you use the *name notation.
926In fact, the following have the same effect (in package main, anyway),
927though the first is more
928efficient because it does the symbol table lookups at compile time:
930 2
932 local(*foo) = *bar;
933 local($_main{'foo'}) = $_main{'bar'};
936You can use this to print out all the variables in a package, for instance.
937Here is from the perl library: 11
940 package dumpvar;
942 sub main'dumpvar {
943 \& ($package) = @_;
944 \& local(*stab) = eval("*_$package");
945 \& while (($key,$val) = each(%stab)) {
946 \& {
947 \& local(*entry) = $val;
948 \& if (defined $entry) {
949 \& print "\e$$key = '$entry'\en";
950 \& } 7
952 \& if (defined @entry) {
953 \& print "\e@$key = (\en";
954 \& foreach $num ($[ .. $#entry) {
955 \& print " $num\et'",$entry[$num],"'\en";
956 \& }
957 \& print ")\en";
958 \& } 10
960 \& if ($key ne "_$package" && defined %entry) {
961 \& print "\e%$key = (\en";
962 \& foreach $key (sort keys(%entry)) {
963 \& print " $key\et'",$entry{$key},"'\en";
964 \& }
965 \& print ")\en";
966 \& }
967 \& }
968 \& }
969 }
972Note that, even though the subroutine is compiled in package dumpvar, the
973name of the subroutine is qualified so that it's name is inserted into package
975.Sh "Style"
976Each programmer will, of course, have his or her own preferences in regards
977to formatting, but there are some general guidelines that will make your
978programs easier to read.
979.Ip 1. 4 4
980Just because you CAN do something a particular way doesn't mean that
981you SHOULD do it that way.
982.I Perl
983is designed to give you several ways to do anything, so consider picking
984the most readable one.
985For instance
987 open(FOO,$foo) || die "Can't open $foo: $!";
989is better than
991 die "Can't open $foo: $!" unless open(FOO,$foo);
993because the second way hides the main point of the statement in a
995On the other hand
997 print "Starting analysis\en" if $verbose;
999is better than
1001 $verbose && print "Starting analysis\en";
1003since the main point isn't whether the user typed -v or not.
1005Similarly, just because an operator lets you assume default arguments
1006doesn't mean that you have to make use of the defaults.
1007The defaults are there for lazy systems programmers writing one-shot
1009If you want your program to be readable, consider supplying the argument.
1011Along the same lines, just because you
1012.I can
1013omit parentheses in many places doesn't mean that you ought to:
1016 return print reverse sort num values array;
1017 return print(reverse(sort num (values(%array))));
1020When in doubt, parenthesize.
1021At the very least it will let some poor schmuck bounce on the % key in vi.
1022.Ip 2. 4 4
1023Don't go through silly contortions to exit a loop at the top or the
1024bottom, when
1025.I perl
1026provides the "last" operator so you can exit in the middle.
1027Just outdent it a little to make it more visible:
1029 7
1031 line:
1032 for (;;) {
1033 statements;
1034 last line if $foo;
1035 next line if /^#/;
1036 statements;
1037 }
1040.Ip 3. 4 4
1041Don't be afraid to use loop labels\*(--they're there to enhance readability as
1042well as to allow multi-level loop breaks.
1043See last example.
1044.Ip 6. 4 4
1045For portability, when using features that may not be implemented on every
1046machine, test the construct in an eval to see if it fails.
1047If you know what version or patchlevel a particular feature was implemented,
1048you can test $] to see if it will be there.
a687059c 1049.Ip 4. 4 4
ae986130 1050Choose mnemonic identifiers.
1051.Ip 5. 4 4
1052Be consistent.
1053.Sh "Debugging"
1054If you invoke
1055.I perl
1056with a
1057.B \-d
1058switch, your script will be run under a debugging monitor.
1059It will halt before the first executable statement and ask you for a
1060command, such as:
1061.Ip "h" 12 4
1062Prints out a help message.
1063.Ip "s" 12 4
1064Single step.
1065Executes until it reaches the beginning of another statement.
1066.Ip "c" 12 4
1068Executes until the next breakpoint is reached.
1069.Ip "<CR>" 12 4
1070Repeat last s or c.
1071.Ip "n" 12 4
1072Single step around subroutine call.
1073.Ip "l min+incr" 12 4
1074List incr+1 lines starting at min.
1075If min is omitted, starts where last listing left off.
1076If incr is omitted, previous value of incr is used.
1077.Ip "l min-max" 12 4
1078List lines in the indicated range.
1079.Ip "l line" 12 4
1080List just the indicated line.
1081.Ip "l" 12 4
1082List incr+1 more lines after last printed line.
1083.Ip "l subname" 12 4
1084List subroutine.
1085If it's a long subroutine it just lists the beginning.
1086Use \*(L"l\*(R" to list more.
1087.Ip "L" 12 4
1088List lines that have breakpoints or actions.
1089.Ip "t" 12 4
1090Toggle trace mode on or off.
1091.Ip "b line" 12 4
1092Set a breakpoint.
1093If line is omitted, sets a breakpoint on the current line
1094line that is about to be executed.
1095Breakpoints may only be set on lines that begin an executable statement.
1096.Ip "b subname" 12 4
1097Set breakpoint at first executable line of subroutine.
1098.Ip "S" 12 4
1099Lists the names of all subroutines.
1100.Ip "d line" 12 4
1101Delete breakpoint.
1102If line is omitted, deletes the breakpoint on the current line
1103line that is about to be executed.
1104.Ip "D" 12 4
1105Delete all breakpoints.
1106.Ip "A" 12 4
1107Delete all line actions.
1108.Ip "V package" 12 4
1109List all variables in package.
1110Default is main package.
1111.Ip "a line command" 12 4
1112Set an action for line.
1113A multi-line command may be entered by backslashing the newlines.
1114.Ip "< command" 12 4
1115Set an action to happen before every debugger prompt.
1116A multi-line command may be entered by backslashing the newlines.
1117.Ip "> command" 12 4
1118Set an action to happen after the prompt when you've just given a command
1119to return to executing the script.
1120A multi-line command may be entered by backslashing the newlines.
1121.Ip "! number" 12 4
1122Redo a debugging command.
1123If number is omitted, redoes the previous command.
1124.Ip "! -number" 12 4
1125Redo the command that was that many commands ago.
1126.Ip "H -number" 12 4
1127Display last n commands.
1128Only commands longer than one character are listed.
1129If number is omitted, lists them all.
1130.Ip "q or ^D" 12 4
1132.Ip "command" 12 4
1133Execute command as a perl statement.
1134A missing semicolon will be supplied.
1135.Ip "p expr" 12 4
1136Same as \*(L"print DB'OUT expr\*(R".
1137The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1138may be redirected to.
1140If you want to modify the debugger, copy from the perl library
1141to your current directory and modify it as necessary.
1142You can do some customization by setting up a .perldb file which contains
1143initialization code.
1144For instance, you could make aliases like these:
1147 $DBalias{'len'} = 's/^len(.*)/p length(\e$1)/';
1148 $DBalias{'stop'} = 's/^stop (at|in)/b/';
1149 $DBalias{'.'} =
1150 's/^./p "\e$DBsub(\e$DBline):\et\e$DBline[\e$DBline]"/';
1153.Sh "Setuid Scripts"
1154.I Perl
1155is designed to make it easy to write secure setuid and setgid scripts.
1156Unlike shells, which are based on multiple substitution passes on each line
1157of the script,
1158.I perl
1159uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1160Additionally, since the language has more built-in functionality, it
1161has to rely less upon external (and possibly untrustworthy) programs to
1162accomplish its purposes.
1164In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1165insecure, but this kernel feature can be disabled.
1166If it is,
1167.I perl
1168can emulate the setuid and setgid mechanism when it notices the otherwise
1169useless setuid/gid bits on perl scripts.
1170If the kernel feature isn't disabled,
1171.I perl
1172will complain loudly that your setuid script is insecure.
1173You'll need to either disable the kernel setuid script feature, or put
1174a C wrapper around the script.
1176When perl is executing a setuid script, it takes special precautions to
1177prevent you from falling into any obvious traps.
1178(In some ways, a perl script is more secure than the corresponding
1179C program.)
1180Any command line argument, environment variable, or input is marked as
1181\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1182command that invokes a subshell, or in any command that modifies files,
1183directories or processes.
1184Any variable that is set within an expression that has previously referenced
1185a tainted value also becomes tainted (even if it is logically impossible
1186for the tainted value to influence the variable).
1187For example:
1189 5
1191 $foo = shift; # $foo is tainted
1192 $bar = $foo,\'bar\'; # $bar is also tainted
1193 $xxx = <>; # Tainted
1194 $path = $ENV{\'PATH\'}; # Tainted, but see below
1195 $abc = \'abc\'; # Not tainted
1196 4
1198 system "echo $foo"; # Insecure
1199 system "echo", $foo; # Secure (doesn't use sh)
1200 system "echo $bar"; # Insecure
1201 system "echo $abc"; # Insecure until PATH set
1202 5
1204 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1205 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1207 $path = $ENV{\'PATH\'}; # Not tainted
1208 system "echo $abc"; # Is secure now!
1209 5
1211 open(FOO,"$foo"); # OK
1212 open(FOO,">$foo"); # Not OK
1214 open(FOO,"echo $foo|"); # Not OK, but...
1215 open(FOO,"-|") || exec \'echo\', $foo; # OK
1217 $zzz = `echo $foo`; # Insecure, zzz tainted
1219 unlink $abc,$foo; # Insecure
1220 umask $foo; # Insecure
1221 3
1223 exec "echo $foo"; # Insecure
1224 exec "echo", $foo; # Secure (doesn't use sh)
1225 exec "sh", \'-c\', $foo; # Considered secure, alas
1228The taintedness is associated with each scalar value, so some elements
1229of an array can be tainted, and others not.
1231If you try to do something insecure, you will get a fatal error saying
1232something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1233Note that you can still write an insecure system call or exec,
ae986130 1234but only by explicitly doing something like the last example above.
1235You can also bypass the tainting mechanism by referencing
1237.I perl
1238presumes that if you reference a substring using $1, $2, etc, you knew
1239what you were doing when you wrote the pattern:
1242 $ARGV[0] =~ /^\-P(\ew+)$/;
1243 $printer = $1; # Not tainted
1246This is fairly secure since \ew+ doesn't match shell metacharacters.
1247Use of .+ would have been insecure, but
1248.I perl
1249doesn't check for that, so you must be careful with your patterns.
1250This is the ONLY mechanism for untainting user supplied filenames if you
1251want to do file operations on them (unless you make $> equal to $<).
1253It's also possible to get into trouble with other operations that don't care
1254whether they use tainted values.
1255Make judicious use of the file tests in dealing with any user-supplied
1257When possible, do opens and such after setting $> = $<.
1258.I Perl
1259doesn't prevent you from opening tainted filenames for reading, so be
1260careful what you print out.
1261The tainting mechanism is intended to prevent stupid mistakes, not to remove
1262the need for thought.
1264.I Perl
1265uses PATH in executing subprocesses, and in finding the script if \-S
1266is used.
1267HOME or LOGDIR are used if chdir has no argument.
1269Apart from these,
1270.I perl
1271uses no environment variables, except to make them available
1272to the script being executed, and to child processes.
1273However, scripts running setuid would do well to execute the following lines
1274before doing anything else, just to keep people honest:
1276 3
1278 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1279 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1280 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1284Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1286/tmp/perl\-eXXXXXX temporary file for
1287.B \-e
1290a2p awk to perl translator
1292s2p sed to perl translator
1294Compilation errors will tell you the line number of the error, with an
1295indication of the next token or token type that was to be examined.
1296(In the case of a script passed to
1297.I perl
1299.B \-e
1300switches, each
1301.B \-e
1302is counted as one line.)
1304Setuid scripts have additional constraints that can produce error messages
1305such as \*(L"Insecure dependency\*(R".
1306See the section on setuid scripts.
1309.IR awk
1310users should take special note of the following:
1311.Ip * 4 2
1312Semicolons are required after all simple statements in
1313.IR perl .
1315is not a statement delimiter.
1316.Ip * 4 2
1317Curly brackets are required on ifs and whiles.
1318.Ip * 4 2
1319Variables begin with $ or @ in
1320.IR perl .
1321.Ip * 4 2
1322Arrays index from 0 unless you set $[.
1323Likewise string positions in substr() and index().
1324.Ip * 4 2
1325You have to decide whether your array has numeric or string indices.
1326.Ip * 4 2
1327Associative array values do not spring into existence upon mere reference.
1328.Ip * 4 2
1329You have to decide whether you want to use string or numeric comparisons.
1330.Ip * 4 2
1331Reading an input line does not split it for you. You get to split it yourself
1332to an array.
1333And the
1334.I split
1335operator has different arguments.
1336.Ip * 4 2
1337The current input line is normally in $_, not $0.
1338It generally does not have the newline stripped.
1339($0 is initially the name of the program executed, then the last matched
1341.Ip * 4 2
1342$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1343match pattern.
1344.Ip * 4 2
1346.I print
1347statement does not add field and record separators unless you set
1348$, and $\e.
1349.Ip * 4 2
1350You must open your files before you print to them.
1351.Ip * 4 2
1352The range operator is \*(L".\|.\*(R", not comma.
1353(The comma operator works as in C.)
1354.Ip * 4 2
1355The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1356(\*(L"~\*(R" is the one's complement operator, as in C.)
1357.Ip * 4 2
1358The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1359(\*(L"^\*(R" is the XOR operator, as in C.)
1360.Ip * 4 2
1361The concatenation operator is \*(L".\*(R", not the null string.
1362(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1363since the third slash would be interpreted as a division operator\*(--the
1364tokener is in fact slightly context sensitive for operators like /, ?, and <.
1365And in fact, . itself can be the beginning of a number.)
1366.Ip * 4 2
1367.IR Next ,
1368.I exit
1370.I continue
1371work differently.
1372.Ip * 4 2
1373The following variables work differently
1376 Awk \h'|2.5i'Perl
1377 ARGC \h'|2.5i'$#ARGV
1378 ARGV[0] \h'|2.5i'$0
1379 FILENAME\h'|2.5i'$ARGV
1380 FNR \h'|2.5i'$. \- something
1381 FS \h'|2.5i'(whatever you like)
1382 NF \h'|2.5i'$#Fld, or some such
1383 NR \h'|2.5i'$.
1384 OFMT \h'|2.5i'$#
1385 OFS \h'|2.5i'$,
1386 ORS \h'|2.5i'$\e
1387 RLENGTH \h'|2.5i'length($&)
ae986130 1388 RS \h'|2.5i'$\/
1389 RSTART \h'|2.5i'length($\`)
1390 SUBSEP \h'|2.5i'$;
1393.Ip * 4 2
1394When in doubt, run the
1395.I awk
1396construct through a2p and see what it gives you.
1398Cerebral C programmers should take note of the following:
1399.Ip * 4 2
1400Curly brackets are required on ifs and whiles.
1401.Ip * 4 2
1402You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1403.Ip * 4 2
1404.I Break
1406.I continue
1408.I last
1410.IR next ,
1412.Ip * 4 2
1413There's no switch statement.
1414.Ip * 4 2
1415Variables begin with $ or @ in
1416.IR perl .
1417.Ip * 4 2
1418Printf does not implement *.
1419.Ip * 4 2
1420Comments begin with #, not /*.
1421.Ip * 4 2
1422You can't take the address of anything.
1423.Ip * 4 2
1424ARGV must be capitalized.
1425.Ip * 4 2
1426The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1427.Ip * 4 2
1428Signal handlers deal with signal names, not numbers.
1429.Ip * 4 2
1430You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1433.I sed
1434programmers should take note of the following:
1435.Ip * 4 2
1436Backreferences in substitutions use $ rather than \e.
1437.Ip * 4 2
1438The pattern matching metacharacters (, ), and | do not have backslashes in front.
1439.Ip * 4 2
1440The range operator is .\|. rather than comma.
1442Sharp shell programmers should take note of the following:
1443.Ip * 4 2
1444The backtick operator does variable interpretation without regard to the
1445presence of single quotes in the command.
1446.Ip * 4 2
1447The backtick operator does no translation of the return value, unlike csh.
1448.Ip * 4 2
1449Shells (especially csh) do several levels of substitution on each command line.
1450.I Perl
1451does substitution only in certain constructs such as double quotes,
1452backticks, angle brackets and search patterns.
1453.Ip * 4 2
1454Shells interpret scripts a little bit at a time.
1455.I Perl
1456compiles the whole program before executing it.
1457.Ip * 4 2
1458The arguments are available via @ARGV, not $1, $2, etc.
1459.Ip * 4 2
1460The environment is not automatically made available as variables.
1461.SH BUGS
1463.I Perl
1464is at the mercy of your machine's definitions of various operations
1465such as type casting, atof() and sprintf().
1467If your stdio requires an seek or eof between reads and writes on a particular
1468stream, so does
1469.IR perl .
1471While none of the built-in data types have any arbitrary size limits (apart
1472from memory size), there are still a few arbitrary limits:
1473a given identifier may not be longer than 255 characters;
1474sprintf is limited on many machines to 128 characters per field (unless the format
1475specifier is exactly %s);
1476and no component of your PATH may be longer than 255 if you use \-S.
1478.I Perl
1479actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1480anyone I said that.
1481.rn }` ''