This is a live mirror of the Perl 5 development currently hosted at
perl 3.0: (no announcement message available)
[perl5.git] /
1''' Beginning of part 4
2''' $Header:,v 3.0 89/10/18 15:21:55 lwall Locked $
4''' $Log:,v $
5''' Revision 3.0 89/10/18 15:21:55 lwall
6''' 3.0 baseline
8.Sh "Precedence"
9.I Perl
10operators have the following associativity and precedence:
13nonassoc\h'|1i'print printf exec system sort reverse
14\h'1.5i'chmod chown kill unlink utime die return
16right\h'|1i'= += \-= *= etc.
21left\h'|1i'| ^
23nonassoc\h'|1i'== != eq ne
24nonassoc\h'|1i'< > <= >= lt gt le ge
25nonassoc\h'|1i'chdir exit eval reset sleep rand umask
26nonassoc\h'|1i'\-r \-w \-x etc.
27left\h'|1i'<< >>
28left\h'|1i'+ \- .
29left\h'|1i'* / % x
30left\h'|1i'=~ !~
31right\h'|1i'! ~ and unary minus
33nonassoc\h'|1i'++ \-\|\-
37As mentioned earlier, if any list operator (print, etc.) or
38any unary operator (chdir, etc.)
39is followed by a left parenthesis as the next token on the same line,
40the operator and arguments within parentheses are taken to
41be of highest precedence, just like a normal function call.
45 chdir $foo || die; # (chdir $foo) || die
46 chdir($foo) || die; # (chdir $foo) || die
47 chdir ($foo) || die; # (chdir $foo) || die
48 chdir +($foo) || die; # (chdir $foo) || die
50but, because * is higher precedence than ||:
52 chdir $foo * 20; # chdir ($foo * 20)
53 chdir($foo) * 20; # (chdir $foo) * 20
54 chdir ($foo) * 20; # (chdir $foo) * 20
55 chdir +($foo) * 20; # chdir ($foo * 20)
57 rand 10 * 20; # rand (10 * 20)
58 rand(10) * 20; # (rand 10) * 20
59 rand (10) * 20; # (rand 10) * 20
60 rand +(10) * 20; # rand (10 * 20)
63In the absence of parentheses,
64the precedence of list operators such as print, sort or chmod is
65either very high or very low depending on whether you look at the left
66side of operator or the right side of it.
67For example, in
70 @ary = (1, 3, sort 4, 2);
71 print @ary; # prints 1324
74the commas on the right of the sort are evaluated before the sort, but
75the commas on the left are evaluated after.
76In other words, list operators tend to gobble up all the arguments that
77follow them, and then act like a simple term with regard to the preceding
79Note that you have to be careful with parens:
81 3
83 # These evaluate exit before doing the print:
84 print($foo, exit); # Obviously not what you want.
85 print $foo, exit; # Nor is this.
86 4
88 # These do the print before evaluating exit:
89 (print $foo), exit; # This is what you want.
90 print($foo), exit; # Or this.
91 print ($foo), exit; # Or even this.
93Also note that
95 print ($foo & 255) + 1, "\en";
98probably doesn't do what you expect at first glance.
99.Sh "Subroutines"
100A subroutine may be declared as follows:
103 sub NAME BLOCK
107Any arguments passed to the routine come in as array @_,
108that is ($_[0], $_[1], .\|.\|.).
109The array @_ is a local array, but its values are references to the
110actual scalar parameters.
111The return value of the subroutine is the value of the last expression
112evaluated, and can be either an array value or a scalar value.
113Alternately, a return statement may be used to specify the returned value and
114exit the subroutine.
115To create local variables see the
116.I local
119A subroutine is called using the
120.I do
121operator or the & operator.
123 12
127 sub MAX {
128 local($max) = pop(@_);
129 foreach $foo (@_) {
130 $max = $foo \|if \|$max < $foo;
131 }
132 $max;
133 }
135 .\|.\|.
136 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
137 21
141 # get a line, combining continuation lines
142 # that start with whitespace
143 sub get_line {
144 $thisline = $lookahead;
145 line: while ($lookahead = <STDIN>) {
146 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
147 $thisline \|.= \|$lookahead;
148 }
149 else {
150 last line;
151 }
152 }
153 $thisline;
154 }
156 $lookahead = <STDIN>; # get first line
157 while ($_ = do get_line(\|)) {
158 .\|.\|.
159 }
160 6
164Use array assignment to a local list to name your formal arguments:
166 sub maybeset {
167 local($key, $value) = @_;
168 $foo{$key} = $value unless $foo{$key};
169 }
172This also has the effect of turning call-by-reference into call-by-value,
173since the assignment copies the values.
175Subroutines may be called recursively.
176If a subroutine is called using the & form, the argument list is optional.
177If omitted, no @_ array is set up for the subroutine; the @_ array at the
178time of the call is visible to subroutine instead.
181 do foo(1,2,3); # pass three arguments
182 &foo(1,2,3); # the same
184 do foo(); # pass a null list
185 &foo(); # the same
186 &foo; # pass no arguments--more efficient
189.Sh "Passing By Reference"
190Sometimes you don't want to pass the value of an array to a subroutine but
191rather the name of it, so that the subroutine can modify the global copy
192of it rather than working with a local copy.
193In perl you can refer to all the objects of a particular name by prefixing
194the name with a star: *foo.
195When evaluated, it produces a scalar value that represents all the objects
196of that name.
197When assigned to within a local() operation, it causes the name mentioned
198to refer to whatever * value was assigned to it.
202 sub doubleary {
203 local(*someary) = @_;
204 foreach $elem (@someary) {
205 $elem *= 2;
206 }
207 }
208 do doubleary(*foo);
209 do doubleary(*bar);
212Assignment to *name is currently recommended only inside a local().
213You can actually assign to *name anywhere, but the previous referent of
214*name may be stranded forever.
215This may or may not bother you.
217Note that scalars are already passed by reference, so you can modify scalar
218arguments without using this mechanism by refering explicitly to the $_[nnn]
219in question.
220You can modify all the elements of an array by passing all the elements
221as scalars, but you have to use the * mechanism to push, pop or change the
222size of an array.
223The * mechanism will probably be more efficient in any case.
225Since a *name value contains unprintable binary data, if it is used as
226an argument in a print, or as a %s argument in a printf or sprintf, it
227then has the value '*name', just so it prints out pretty.
228.Sh "Regular Expressions"
229The patterns used in pattern matching are regular expressions such as
230those supplied in the Version 8 regexp routines.
231(In fact, the routines are derived from Henry Spencer's freely redistributable
232reimplementation of the V8 routines.)
233In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
234Word boundaries may be matched by \eb, and non-boundaries by \eB.
235A whitespace character is matched by \es, non-whitespace by \eS.
236A numeric character is matched by \ed, non-numeric by \eD.
237You may use \ew, \es and \ed within character classes.
238Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
239Within character classes \eb represents backspace rather than a word boundary.
240Alternatives may be separated by |.
241The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
242matches the digit'th substring, where digit can range from 1 to 9.
243(Outside of the pattern, always use $ instead of \e in front of the digit.
244The scope of $<digit> (and $\`, $& and $\')
245extends to the end of the enclosing BLOCK or eval string, or to
246the next pattern match with subexpressions.
247The \e<digit> notation sometimes works outside the current pattern, but should
248not be relied upon.)
249$+ returns whatever the last bracket match matched.
250$& returns the entire matched string.
251($0 normally returns the same thing, but don't depend on it.)
252$\` returns everything before the matched string.
253$\' returns everything after the matched string.
257 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
258 5
260 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
261 $hours = $1;
262 $minutes = $2;
263 $seconds = $3;
264 }
267By default, the ^ character matches only the beginning of the string,
268the $ character matches only at the end (or before the newline at the end)
270.I perl
271does certain optimizations with the assumption that the string contains
272only one line.
273You may, however, wish to treat a string as a multi-line buffer, such that
274the ^ will match after any newline within the string, and $ will match
275before any newline.
276At the cost of a little more overhead, you can do this by setting the variable
277$* to 1.
278Setting it back to 0 makes
279.I perl
280revert to its old behavior.
282To facilitate multi-line substitutions, the . character never matches a newline
283(even when $* is 0).
284In particular, the following leaves a newline on the $_ string:
287 $_ = <STDIN>;
288 s/.*(some_string).*/$1/;
290If the newline is unwanted, try one of
292 s/.*(some_string).*\en/$1/;
293 s/.*(some_string)[^\e000]*/$1/;
294 s/.*(some_string)(.|\en)*/$1/;
295 chop; s/.*(some_string).*/$1/;
296 /(some_string)/ && ($_ = $1);
299Any item of a regular expression may be followed with digits in curly brackets
300of the form {n,m}, where n gives the minimum number of times to match the item
301and m gives the maximum.
302The form {n} is equivalent to {n,n} and matches exactly n times.
303The form {n,} matches n or more times.
304(If a curly bracket occurs in any other context, it is treated as a regular
306The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
307to {0,1}.
308There is no limit to the size of n or m, but large numbers will chew up
309more memory.
311You will note that all backslashed metacharacters in
312.I perl
313are alphanumeric,
314such as \eb, \ew, \en.
315Unlike some other regular expression languages, there are no backslashed
316symbols that aren't alphanumeric.
317So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
318interpreted as a literal character, not a metacharacter.
319This makes it simple to quote a string that you want to use for a pattern
320but that you are afraid might contain metacharacters.
321Simply quote all the non-alphanumeric characters:
324 $pattern =~ s/(\eW)/\e\e$1/g;
327.Sh "Formats"
328Output record formats for use with the
329.I write
330operator may declared as follows:
332 3
334 format NAME =
336 .
339If name is omitted, format \*(L"STDOUT\*(R" is defined.
340FORMLIST consists of a sequence of lines, each of which may be of one of three
342.Ip 1. 4
343A comment.
344.Ip 2. 4
345A \*(L"picture\*(R" line giving the format for one output line.
346.Ip 3. 4
347An argument line supplying values to plug into a picture line.
349Picture lines are printed exactly as they look, except for certain fields
350that substitute values into the line.
351Each picture field starts with either @ or ^.
352The @ field (not to be confused with the array marker @) is the normal
353case; ^ fields are used
354to do rudimentary multi-line text block filling.
355The length of the field is supplied by padding out the field
356with multiple <, >, or | characters to specify, respectively, left justification,
357right justification, or centering.
358If any of the values supplied for these fields contains a newline, only
359the text up to the newline is printed.
360The special field @* can be used for printing multi-line values.
361It should appear by itself on a line.
363The values are specified on the following line, in the same order as
364the picture fields.
365The values should be separated by commas.
367Picture fields that begin with ^ rather than @ are treated specially.
368The value supplied must be a scalar variable name which contains a text
370.I Perl
371puts as much text as it can into the field, and then chops off the front
372of the string so that the next time the variable is referenced,
373more of the text can be printed.
374Normally you would use a sequence of fields in a vertical stack to print
375out a block of text.
376If you like, you can end the final field with .\|.\|., which will appear in the
377output if the text was too long to appear in its entirety.
378You can change which characters are legal to break on by changing the
379variable $: to a list of the desired characters.
381Since use of ^ fields can produce variable length records if the text to be
382formatted is short, you can suppress blank lines by putting the tilde (~)
383character anywhere in the line.
384(Normally you should put it in the front if possible, for visibility.)
385The tilde will be translated to a space upon output.
386If you put a second tilde contiguous to the first, the line will be repeated
387until all the fields on the line are exhausted.
388(If you use a field of the @ variety, the expression you supply had better
389not give the same value every time forever!)
393.lg 0
394.cs R 25
395.ft C
396 10
398# a report on the /etc/passwd file
399format top =
400\& Passwd File
401Name Login Office Uid Gid Home
404format STDOUT =
405@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
406$name, $login, $office,$uid,$gid, $home
408 29
410# a report from a bug report form
411format top =
412\& Bug Reports
413@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
414$system, $%, $date
417format STDOUT =
418Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
419\& $subject
420Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
421\& $index, $description
422Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
423\& $priority, $date, $description
424From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
425\& $from, $description
426Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
427\& $programmer, $description
428\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
429\& $description
430\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
431\& $description
432\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
433\& $description
434\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
435\& $description
436\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
437\& $description
440.ft R
441.cs R
444It is possible to intermix prints with writes on the same output channel,
445but you'll have to handle $\- (lines left on the page) yourself.
447If you are printing lots of fields that are usually blank, you should consider
448using the reset operator between records.
449Not only is it more efficient, but it can prevent the bug of adding another
450field and forgetting to zero it.
451.Sh "Interprocess Communication"
452The IPC facilities of perl are built on the Berkeley socket mechanism.
453If you don't have sockets, you can ignore this section.
454The calls have the same names as the corresponding system calls,
455but the arguments tend to differ, for two reasons.
456First, perl file handles work differently than C file descriptors.
457Second, perl already knows the length of its strings, so you don't need
458to pass that information.
459Here is a sample client (untested):
462 ($them,$port) = @ARGV;
463 $port = 2345 unless $port;
464 $them = 'localhost' unless $them;
466 $SIG{'INT'} = 'dokill';
467 sub dokill { kill 9,$child if $child; }
469 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
471 $sockaddr = 'S n a4 x8';
472 chop($hostname = `hostname`);
474 ($name, $aliases, $proto) = getprotobyname('tcp');
475 ($name, $aliases, $port) = getservbyname($port, 'tcp')
476 unless $port =~ /^\ed+$/;;
477 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
478 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
480 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
481 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
483 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
484 bind(S, $this) || die "bind: $!";
485 connect(S, $that) || die "connect: $!";
487 select(S); $| = 1; select(stdout);
489 if ($child = fork) {
490 while (<>) {
491 print S;
492 }
493 sleep 3;
494 do dokill();
495 }
496 else {
497 while (<S>) {
498 print;
499 }
500 }
503And here's a server:
506 ($port) = @ARGV;
507 $port = 2345 unless $port;
509 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
511 $sockaddr = 'S n a4 x8';
513 ($name, $aliases, $proto) = getprotobyname('tcp');
514 ($name, $aliases, $port) = getservbyname($port, 'tcp')
515 unless $port =~ /^\ed+$/;;
517 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
519 select(NS); $| = 1; select(stdout);
521 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
522 bind(S, $this) || die "bind: $!";
523 listen(S, 5) || die "connect: $!";
525 select(S); $| = 1; select(stdout);
527 for (;;) {
528 print "Listening again\en";
529 ($addr = accept(NS,S)) || die $!;
530 print "accept ok\en";
532 ($af,$port,$inetaddr) = unpack($pat,$addr);
533 @inetaddr = unpack('C4',$inetaddr);
534 print "$af $port @inetaddr\en";
536 while (<NS>) {
537 print;
538 print NS;
539 }
540 }
543.Sh "Predefined Names"
544The following names have special meaning to
545.IR perl .
546I could have used alphabetic symbols for some of these, but I didn't want
547to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
549You'll just have to suffer along with these silly symbols.
550Most of them have reasonable mnemonics, or analogues in one of the shells.
551.Ip $_ 8
552The default input and pattern-searching space.
553The following pairs are equivalent:
555 2
557 while (<>) {\|.\|.\|. # only equivalent in while!
558 while ($_ = <>) {\|.\|.\|.
559 2
561 /\|^Subject:/
562 $_ \|=~ \|/\|^Subject:/
563 2
565 y/a\-z/A\-Z/
566 $_ =~ y/a\-z/A\-Z/
567 2
569 chop
570 chop($_)
573(Mnemonic: underline is understood in certain operations.)
574.Ip $. 8
575The current input line number of the last filehandle that was read.
577Remember that only an explicit close on the filehandle resets the line number.
578Since <> never does an explicit close, line numbers increase across ARGV files
579(but see examples under eof).
580(Mnemonic: many programs use . to mean the current line number.)
581.Ip $/ 8
582The input record separator, newline by default.
583Works like
584.IR awk 's
585RS variable, including treating blank lines as delimiters
586if set to the null string.
587If set to a value longer than one character, only the first character is used.
588(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
589.Ip $, 8
590The output field separator for the print operator.
591Ordinarily the print operator simply prints out the comma separated fields
592you specify.
593In order to get behavior more like
594.IR awk ,
595set this variable as you would set
596.IR awk 's
597OFS variable to specify what is printed between fields.
598(Mnemonic: what is printed when there is a , in your print statement.)
599.Ip $"" 8
600This is like $, except that it applies to array values interpolated into
601a double-quoted string (or similar interpreted string).
602Default is a space.
603(Mnemonic: obvious, I think.)
604.Ip $\e 8
605The output record separator for the print operator.
606Ordinarily the print operator simply prints out the comma separated fields
607you specify, with no trailing newline or record separator assumed.
608In order to get behavior more like
609.IR awk ,
610set this variable as you would set
611.IR awk 's
612ORS variable to specify what is printed at the end of the print.
613(Mnemonic: you set $\e instead of adding \en at the end of the print.
614Also, it's just like /, but it's what you get \*(L"back\*(R" from
615.IR perl .)
616.Ip $# 8
617The output format for printed numbers.
618This variable is a half-hearted attempt to emulate
619.IR awk 's
620OFMT variable.
621There are times, however, when
622.I awk
624.I perl
625have differing notions of what
626is in fact numeric.
627Also, the initial value is %.20g rather than %.6g, so you need to set $#
628explicitly to get
629.IR awk 's
631(Mnemonic: # is the number sign.)
632.Ip $% 8
633The current page number of the currently selected output channel.
634(Mnemonic: % is page number in nroff.)
635.Ip $= 8
636The current page length (printable lines) of the currently selected output
638Default is 60.
639(Mnemonic: = has horizontal lines.)
640.Ip $\- 8
641The number of lines left on the page of the currently selected output channel.
642(Mnemonic: lines_on_page \- lines_printed.)
643.Ip $~ 8
644The name of the current report format for the currently selected output
646(Mnemonic: brother to $^.)
647.Ip $^ 8
648The name of the current top-of-page format for the currently selected output
650(Mnemonic: points to top of page.)
651.Ip $| 8
652If set to nonzero, forces a flush after every write or print on the currently
653selected output channel.
654Default is 0.
655Note that
657will typically be line buffered if output is to the
658terminal and block buffered otherwise.
659Setting this variable is useful primarily when you are outputting to a pipe,
660such as when you are running a
661.I perl
662script under rsh and want to see the
663output as it's happening.
664(Mnemonic: when you want your pipes to be piping hot.)
665.Ip $$ 8
666The process number of the
667.I perl
668running this script.
669(Mnemonic: same as shells.)
670.Ip $? 8
671The status returned by the last pipe close, backtick (\`\`) command or
672.I system
674Note that this is the status word returned by the wait() system
675call, so the exit value of the subprocess is actually ($? >> 8).
676$? & 255 gives which signal, if any, the process died from, and whether
677there was a core dump.
678(Mnemonic: similar to sh and ksh.)
679.Ip $& 8 4
680The string matched by the last pattern match (not counting any matches hidden
681within a BLOCK or eval enclosed by the current BLOCK).
682(Mnemonic: like & in some editors.)
683.Ip $\` 8 4
684The string preceding whatever was matched by the last pattern match
685(not counting any matches hidden within a BLOCK or eval enclosed by the current
687(Mnemonic: \` often precedes a quoted string.)
688.Ip $\' 8 4
689The string following whatever was matched by the last pattern match
690(not counting any matches hidden within a BLOCK or eval enclosed by the current
692(Mnemonic: \' often follows a quoted string.)
695 3
697 $_ = \'abcdefghi\';
698 /def/;
699 print "$\`:$&:$\'\en"; # prints abc:def:ghi
702.Ip $+ 8 4
703The last bracket matched by the last search pattern.
704This is useful if you don't know which of a set of alternative patterns
706For example:
709 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
712(Mnemonic: be positive and forward looking.)
713.Ip $* 8 2
714Set to 1 to do multiline matching within a string, 0 to tell
715.I perl
716that it can assume that strings contain a single line, for the purpose
717of optimizing pattern matches.
718Pattern matches on strings containing multiple newlines can produce confusing
719results when $* is 0.
720Default is 0.
721(Mnemonic: * matches multiple things.)
722.Ip $0 8
723Contains the name of the file containing the
724.I perl
725script being executed.
726The value should be copied elsewhere before any pattern matching happens, which
727clobbers $0.
728(Mnemonic: same as sh and ksh.)
729.Ip $<digit> 8
730Contains the subpattern from the corresponding set of parentheses in the last
731pattern matched, not counting patterns matched in nested blocks that have
732been exited already.
733(Mnemonic: like \edigit.)
734.Ip $[ 8 2
735The index of the first element in an array, and of the first character in
736a substring.
737Default is 0, but you could set it to 1 to make
738.I perl
739behave more like
740.I awk
741(or Fortran)
742when subscripting and when evaluating the index() and substr() functions.
743(Mnemonic: [ begins subscripts.)
744.Ip $] 8 2
745The string printed out when you say \*(L"perl -v\*(R".
746It can be used to determine at the beginning of a script whether the perl
747interpreter executing the script is in the right range of versions.
750 5
752 # see if getc is available
753 ($version,$patchlevel) =
754 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
755 print STDERR "(No filename completion available.)\en"
756 if $version * 1000 + $patchlevel < 2016;
759(Mnemonic: Is this version of perl in the right bracket?)
760.Ip $; 8 2
761The subscript separator for multi-dimensional array emulation.
762If you refer to an associative array element as
764 $foo{$a,$b,$c}
766it really means
768 $foo{join($;, $a, $b, $c)}
770But don't put
772 @foo{$a,$b,$c} # a slice--note the @
774which means
776 ($foo{$a},$foo{$b},$foo{$c})
779Default is "\e034", the same as SUBSEP in
780.IR awk .
781Note that if your keys contain binary data there might not be any safe
782value for $;.
783(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
784Yeah, I know, it's pretty lame, but $, is already taken for something more
786.Ip $! 8 2
787If used in a numeric context, yields the current value of errno, with all the
788usual caveats.
789If used in a string context, yields the corresponding system error string.
790You can assign to $! in order to set errno
791if, for instance, you want $! to return the string for error n, or you want
792to set the exit value for the die operator.
793(Mnemonic: What just went bang?)
794.Ip $@ 8 2
795The error message from the last eval command.
796If null, the last eval parsed and executed correctly.
797(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
798.Ip $< 8 2
799The real uid of this process.
800(Mnemonic: it's the uid you came FROM, if you're running setuid.)
801.Ip $> 8 2
802The effective uid of this process.
805 2
807 $< = $>; # set real uid to the effective uid
808 ($<,$>) = ($>,$<); # swap real and effective uid
811(Mnemonic: it's the uid you went TO, if you're running setuid.)
812Note: $< and $> can only be swapped on machines supporting setreuid().
813.Ip $( 8 2
814The real gid of this process.
815If you are on a machine that supports membership in multiple groups
816simultaneously, gives a space separated list of groups you are in.
817The first number is the one returned by getgid(), and the subsequent ones
818by getgroups(), one of which may be the same as the first number.
819(Mnemonic: parentheses are used to GROUP things.
820The real gid is the group you LEFT, if you're running setgid.)
821.Ip $) 8 2
822The effective gid of this process.
823If you are on a machine that supports membership in multiple groups
824simultaneously, gives a space separated list of groups you are in.
825The first number is the one returned by getegid(), and the subsequent ones
826by getgroups(), one of which may be the same as the first number.
827(Mnemonic: parentheses are used to GROUP things.
828The effective gid is the group that's RIGHT for you, if you're running setgid.)
830Note: $<, $>, $( and $) can only be set on machines that support the
831corresponding set[re][ug]id() routine.
832$( and $) can only be swapped on machines supporting setregid().
833.Ip $: 8 2
834The current set of characters after which a string may be broken to
835fill continuation fields (starting with ^) in a format.
836Default is "\ \en-", to break on whitespace or hyphens.
837(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
838.Ip @ARGV 8 3
839The array ARGV contains the command line arguments intended for the script.
840Note that $#ARGV is the generally number of arguments minus one, since
841$ARGV[0] is the first argument, NOT the command name.
842See $0 for the command name.
843.Ip @INC 8 3
844The array INC contains the list of places to look for
845.I perl
846scripts to be
847evaluated by the \*(L"do EXPR\*(R" command.
848It initially consists of the arguments to any
849.B \-I
850command line switches, followed
851by the default
852.I perl
853library, probably \*(L"/usr/local/lib/perl\*(R".
854.Ip $ENV{expr} 8 2
855The associative array ENV contains your current environment.
856Setting a value in ENV changes the environment for child processes.
857.Ip $SIG{expr} 8 2
858The associative array SIG is used to set signal handlers for various signals.
861 12
863 sub handler { # 1st argument is signal name
864 local($sig) = @_;
865 print "Caught a SIG$sig\-\|\-shutting down\en";
866 close(LOG);
867 exit(0);
868 }
870 $SIG{\'INT\'} = \'handler\';
871 $SIG{\'QUIT\'} = \'handler\';
872 .\|.\|.
873 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
874 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
877The SIG array only contains values for the signals actually set within
878the perl script.
879.Sh "Packages"
880Perl provides a mechanism for alternate namespaces to protect packages from
881stomping on each others variables.
882By default, a perl script starts compiling into the package known as \*(L"main\*(R".
883By use of the
884.I package
885declaration, you can switch namespaces.
886The scope of the package declaration is from the declaration itself to the end
887of the enclosing block (the same scope as the local() operator).
888Typically it would be the first declaration in a file to be included by
889the \*(L"do FILE\*(R" operator.
890You can switch into a package in more than one place; it merely influences
891which symbol table is used by the compiler for the rest of that block.
892You can refer to variables in other packages by prefixing the name with
893the package name and a single quote.
894If the package name is null, the \*(L"main\*(R" package as assumed.
895Eval'ed strings are compiled in the package in which the eval was compiled
897(Assignments to $SIG{}, however, assume the signal handler specified is in the
898main package.
899Qualify the signal handler name if you wish to have a signal handler in
900a package.)
901For an example, examine in the perl library.
902It initially switches to the DB package so that the debugger doesn't interfere
903with variables in the script you are trying to debug.
904At various points, however, it temporarily switches back to the main package
905to evaluate various expressions in the context of the main package.
907The symbol table for a package happens to be stored in the associative array
908of that name prepended with an underscore.
909The value in each entry of the associative array is
910what you are referring to when you use the *name notation.
911In fact, the following have the same effect (in package main, anyway),
912though the first is more
913efficient because it does the symbol table lookups at compile time:
915 2
917 local(*foo) = *bar;
918 local($_main{'foo'}) = $_main{'bar'};
921You can use this to print out all the variables in a package, for instance.
922Here is from the perl library: 11
925 package dumpvar;
927 sub main'dumpvar {
928 \& ($package) = @_;
929 \& local(*stab) = eval("*_$package");
930 \& while (($key,$val) = each(%stab)) {
931 \& {
932 \& local(*entry) = $val;
933 \& if (defined $entry) {
934 \& print "\e$$key = '$entry'\en";
935 \& } 7
937 \& if (defined @entry) {
938 \& print "\e@$key = (\en";
939 \& foreach $num ($[ .. $#entry) {
940 \& print " $num\et'",$entry[$num],"'\en";
941 \& }
942 \& print ")\en";
943 \& } 10
945 \& if ($key ne "_$package" && defined %entry) {
946 \& print "\e%$key = (\en";
947 \& foreach $key (sort keys(%entry)) {
948 \& print " $key\et'",$entry{$key},"'\en";
949 \& }
950 \& print ")\en";
951 \& }
952 \& }
953 \& }
954 }
957Note that, even though the subroutine is compiled in package dumpvar, the
958name of the subroutine is qualified so that it's name is inserted into package
960.Sh "Style"
961Each programmer will, of course, have his or her own preferences in regards
962to formatting, but there are some general guidelines that will make your
963programs easier to read.
964.Ip 1. 4 4
965Just because you CAN do something a particular way doesn't mean that
966you SHOULD do it that way.
967.I Perl
968is designed to give you several ways to do anything, so consider picking
969the most readable one.
970For instance
972 open(FOO,$foo) || die "Can't open $foo: $!";
974is better than
976 die "Can't open $foo: $!" unless open(FOO,$foo);
978because the second way hides the main point of the statement in a
980On the other hand
982 print "Starting analysis\en" if $verbose;
984is better than
986 $verbose && print "Starting analysis\en";
988since the main point isn't whether the user typed -v or not.
990Similarly, just because an operator lets you assume default arguments
991doesn't mean that you have to make use of the defaults.
992The defaults are there for lazy systems programmers writing one-shot
994If you want your program to be readable, consider supplying the argument.
995.Ip 2. 4 4
996Don't go through silly contortions to exit a loop at the top or the
997bottom, when
998.I perl
999provides the "last" operator so you can exit in the middle.
1000Just outdent it a little to make it more visible:
1002 7
1004 line:
1005 for (;;) {
1006 statements;
1007 last line if $foo;
1008 next line if /^#/;
1009 statements;
1010 }
1013.Ip 3. 4 4
1014Don't be afraid to use loop labels\*(--they're there to enhance readability as
1015well as to allow multi-level loop breaks.
1016See last example.
1017.Ip 6. 4 4
1018For portability, when using features that may not be implemented on every
1019machine, test the construct in an eval to see if it fails.
1020.Ip 4. 4 4
1021Choose mnemonic indentifiers.
1022.Ip 5. 4 4
1023Be consistent.
1024.Sh "Debugging"
1025If you invoke
1026.I perl
1027with a
1028.B \-d
1029switch, your script will be run under a debugging monitor.
1030It will halt before the first executable statement and ask you for a
1031command, such as:
1032.Ip "h" 12 4
1033Prints out a help message.
1034.Ip "s" 12 4
1035Single step.
1036Executes until it reaches the beginning of another statement.
1037.Ip "c" 12 4
1039Executes until the next breakpoint is reached.
1040.Ip "<CR>" 12 4
1041Repeat last s or c.
1042.Ip "n" 12 4
1043Single step around subroutine call.
1044.Ip "l min+incr" 12 4
1045List incr+1 lines starting at min.
1046If min is omitted, starts where last listing left off.
1047If incr is omitted, previous value of incr is used.
1048.Ip "l min-max" 12 4
1049List lines in the indicated range.
1050.Ip "l line" 12 4
1051List just the indicated line.
1052.Ip "l" 12 4
1053List incr+1 more lines after last printed line.
1054.Ip "l subname" 12 4
1055List subroutine.
1056If it's a long subroutine it just lists the beginning.
1057Use \*(L"l\*(R" to list more.
1058.Ip "L" 12 4
1059List lines that have breakpoints or actions.
1060.Ip "t" 12 4
1061Toggle trace mode on or off.
1062.Ip "b line" 12 4
1063Set a breakpoint.
1064If line is omitted, sets a breakpoint on the current line
1065line that is about to be executed.
1066Breakpoints may only be set on lines that begin an executable statement.
1067.Ip "b subname" 12 4
1068Set breakpoint at first executable line of subroutine.
1069.Ip "S" 12 4
1070Lists the names of all subroutines.
1071.Ip "d line" 12 4
1072Delete breakpoint.
1073If line is omitted, deletes the breakpoint on the current line
1074line that is about to be executed.
1075.Ip "D" 12 4
1076Delete all breakpoints.
1077.Ip "A" 12 4
1078Delete all line actions.
1079.Ip "V package" 12 4
1080List all variables in package.
1081Default is main package.
1082.Ip "a line command" 12 4
1083Set an action for line.
1084A multi-line command may be entered by backslashing the newlines.
1085.Ip "< command" 12 4
1086Set an action to happen before every debugger prompt.
1087A multi-line command may be entered by backslashing the newlines.
1088.Ip "> command" 12 4
1089Set an action to happen after the prompt when you've just given a command
1090to return to executing the script.
1091A multi-line command may be entered by backslashing the newlines.
1092.Ip "! number" 12 4
1093Redo a debugging command.
1094If number is omitted, redoes the previous command.
1095.Ip "! -number" 12 4
1096Redo the command that was that many commands ago.
1097.Ip "H -number" 12 4
1098Display last n commands.
1099Only commands longer than one character are listed.
1100If number is omitted, lists them all.
1101.Ip "q or ^D" 12 4
1103.Ip "command" 12 4
1104Execute command as a perl statement.
1105A missing semicolon will be supplied.
1106.Ip "p expr" 12 4
1107Same as \*(L"print DB'OUT expr\*(R".
1108The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1109may be redirected to.
1111If you want to modify the debugger, copy from the perl library
1112to your current directory and modify it as necessary.
1113You can do some customization by setting up a .perldb file which contains
1114initialization code.
1115For instance, you could make aliases like these:
1118 $DBalias{'len'} = 's/^len(.*)/p length(\e$1)/';
1119 $DBalias{'stop'} = 's/^stop (at|in)/b/';
1120 $DBalias{'.'} =
1121 's/^./p "\e$DBsub(\e$DBline):\et\e$DBline[\e$DBline]"/';
1124.Sh "Setuid Scripts"
1125.I Perl
1126is designed to make it easy to write secure setuid and setgid scripts.
1127Unlike shells, which are based on multiple substitution passes on each line
1128of the script,
1129.I perl
1130uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1131Additionally, since the language has more built-in functionality, it
1132has to rely less upon external (and possibly untrustworthy) programs to
1133accomplish its purposes.
1135In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1136insecure, but this kernel feature can be disabled.
1137If it is,
1138.I perl
1139can emulate the setuid and setgid mechanism when it notices the otherwise
1140useless setuid/gid bits on perl scripts.
1141If the kernel feature isn't disabled,
1142.I perl
1143will complain loudly that your setuid script is insecure.
1144You'll need to either disable the kernel setuid script feature, or put
1145a C wrapper around the script.
1147When perl is executing a setuid script, it takes special precautions to
1148prevent you from falling into any obvious traps.
1149(In some ways, a perl script is more secure than the corresponding
1150C program.)
1151Any command line argument, environment variable, or input is marked as
1152\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1153command that invokes a subshell, or in any command that modifies files,
1154directories or processes.
1155Any variable that is set within an expression that has previously referenced
1156a tainted value also becomes tainted (even if it is logically impossible
1157for the tainted value to influence the variable).
1158For example:
1160 5
1162 $foo = shift; # $foo is tainted
1163 $bar = $foo,\'bar\'; # $bar is also tainted
1164 $xxx = <>; # Tainted
1165 $path = $ENV{\'PATH\'}; # Tainted, but see below
1166 $abc = \'abc\'; # Not tainted
1167 4
1169 system "echo $foo"; # Insecure
1170 system "echo", $foo; # Secure (doesn't use sh)
1171 system "echo $bar"; # Insecure
1172 system "echo $abc"; # Insecure until PATH set
1173 5
1175 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1176 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1178 $path = $ENV{\'PATH\'}; # Not tainted
1179 system "echo $abc"; # Is secure now!
1180 5
1182 open(FOO,"$foo"); # OK
1183 open(FOO,">$foo"); # Not OK
1185 open(FOO,"echo $foo|"); # Not OK, but...
1186 open(FOO,"-|") || exec \'echo\', $foo; # OK
1188 $zzz = `echo $foo`; # Insecure, zzz tainted
1190 unlink $abc,$foo; # Insecure
1191 umask $foo; # Insecure
1192 3
1194 exec "echo $foo"; # Insecure
1195 exec "echo", $foo; # Secure (doesn't use sh)
1196 exec "sh", \'-c\', $foo; # Considered secure, alas
1199The taintedness is associated with each scalar value, so some elements
1200of an array can be tainted, and others not.
1202If you try to do something insecure, you will get a fatal error saying
1203something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1204Note that you can still write an insecure system call or exec,
1205but only by explicity doing something like the last example above.
1206You can also bypass the tainting mechanism by referencing
1208.I perl
1209presumes that if you reference a substring using $1, $2, etc, you knew
1210what you were doing when you wrote the pattern:
1213 $ARGV[0] =~ /^\-P(\ew+)$/;
1214 $printer = $1; # Not tainted
1217This is fairly secure since \ew+ doesn't match shell metacharacters.
1218Use of .+ would have been insecure, but
1219.I perl
1220doesn't check for that, so you must be careful with your patterns.
1221This is the ONLY mechanism for untainting user supplied filenames if you
1222want to do file operations on them (unless you make $> equal to $<).
1224It's also possible to get into trouble with other operations that don't care
1225whether they use tainted values.
1226Make judicious use of the file tests in dealing with any user-supplied
1228When possible, do opens and such after setting $> = $<.
1229.I Perl
1230doesn't prevent you from opening tainted filenames for reading, so be
1231careful what you print out.
1232The tainting mechanism is intended to prevent stupid mistakes, not to remove
1233the need for thought.
1235.I Perl
1236uses PATH in executing subprocesses, and in finding the script if \-S
1237is used.
1238HOME or LOGDIR are used if chdir has no argument.
1240Apart from these,
1241.I perl
1242uses no environment variables, except to make them available
1243to the script being executed, and to child processes.
1244However, scripts running setuid would do well to execute the following lines
1245before doing anything else, just to keep people honest:
1247 3
1249 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1250 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1251 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1255Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1257/tmp/perl\-eXXXXXX temporary file for
1258.B \-e
1261a2p awk to perl translator
1263s2p sed to perl translator
1265Compilation errors will tell you the line number of the error, with an
1266indication of the next token or token type that was to be examined.
1267(In the case of a script passed to
1268.I perl
1270.B \-e
1271switches, each
1272.B \-e
1273is counted as one line.)
1275Setuid scripts have additional constraints that can produce error messages
1276such as \*(L"Insecure dependency\*(R".
1277See the section on setuid scripts.
1280.IR awk
1281users should take special note of the following:
1282.Ip * 4 2
1283Semicolons are required after all simple statements in
1284.IR perl .
1286is not a statement delimiter.
1287.Ip * 4 2
1288Curly brackets are required on ifs and whiles.
1289.Ip * 4 2
1290Variables begin with $ or @ in
1291.IR perl .
1292.Ip * 4 2
1293Arrays index from 0 unless you set $[.
1294Likewise string positions in substr() and index().
1295.Ip * 4 2
1296You have to decide whether your array has numeric or string indices.
1297.Ip * 4 2
1298Associative array values do not spring into existence upon mere reference.
1299.Ip * 4 2
1300You have to decide whether you want to use string or numeric comparisons.
1301.Ip * 4 2
1302Reading an input line does not split it for you. You get to split it yourself
1303to an array.
1304And the
1305.I split
1306operator has different arguments.
1307.Ip * 4 2
1308The current input line is normally in $_, not $0.
1309It generally does not have the newline stripped.
1310($0 is initially the name of the program executed, then the last matched
1312.Ip * 4 2
1313$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1314match pattern.
1315.Ip * 4 2
1317.I print
1318statement does not add field and record separators unless you set
1319$, and $\e.
1320.Ip * 4 2
1321You must open your files before you print to them.
1322.Ip * 4 2
1323The range operator is \*(L".\|.\*(R", not comma.
1324(The comma operator works as in C.)
1325.Ip * 4 2
1326The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1327(\*(L"~\*(R" is the one's complement operator, as in C.)
1328.Ip * 4 2
1329The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1330(\*(L"^\*(R" is the XOR operator, as in C.)
1331.Ip * 4 2
1332The concatenation operator is \*(L".\*(R", not the null string.
1333(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1334since the third slash would be interpreted as a division operator\*(--the
1335tokener is in fact slightly context sensitive for operators like /, ?, and <.
1336And in fact, . itself can be the beginning of a number.)
1337.Ip * 4 2
1338.IR Next ,
1339.I exit
1341.I continue
1342work differently.
1343.Ip * 4 2
1344The following variables work differently
1347 Awk \h'|2.5i'Perl
1348 ARGC \h'|2.5i'$#ARGV
1349 ARGV[0] \h'|2.5i'$0
1350 FILENAME\h'|2.5i'$ARGV
1351 FNR \h'|2.5i'$. \- something
1352 FS \h'|2.5i'(whatever you like)
1353 NF \h'|2.5i'$#Fld, or some such
1354 NR \h'|2.5i'$.
1355 OFMT \h'|2.5i'$#
1356 OFS \h'|2.5i'$,
1357 ORS \h'|2.5i'$\e
1358 RLENGTH \h'|2.5i'length($&)
1359 RS \h'|2.5i'$/
1360 RSTART \h'|2.5i'length($\`)
1361 SUBSEP \h'|2.5i'$;
1364.Ip * 4 2
1365When in doubt, run the
1366.I awk
1367construct through a2p and see what it gives you.
1369Cerebral C programmers should take note of the following:
1370.Ip * 4 2
1371Curly brackets are required on ifs and whiles.
1372.Ip * 4 2
1373You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1374.Ip * 4 2
1375.I Break
1377.I continue
1379.I last
1381.IR next ,
1383.Ip * 4 2
1384There's no switch statement.
1385.Ip * 4 2
1386Variables begin with $ or @ in
1387.IR perl .
1388.Ip * 4 2
1389Printf does not implement *.
1390.Ip * 4 2
1391Comments begin with #, not /*.
1392.Ip * 4 2
1393You can't take the address of anything.
1394.Ip * 4 2
1395ARGV must be capitalized.
1396.Ip * 4 2
1397The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1398.Ip * 4 2
1399Signal handlers deal with signal names, not numbers.
1400.Ip * 4 2
1401You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1404.I sed
1405programmers should take note of the following:
1406.Ip * 4 2
1407Backreferences in substitutions use $ rather than \e.
1408.Ip * 4 2
1409The pattern matching metacharacters (, ), and | do not have backslashes in front.
1410.Ip * 4 2
1411The range operator is .\|. rather than comma.
1413Sharp shell programmers should take note of the following:
1414.Ip * 4 2
1415The backtick operator does variable interpretation without regard to the
1416presence of single quotes in the command.
1417.Ip * 4 2
1418The backtick operator does no translation of the return value, unlike csh.
1419.Ip * 4 2
1420Shells (especially csh) do several levels of substitution on each command line.
1421.I Perl
1422does substitution only in certain constructs such as double quotes,
1423backticks, angle brackets and search patterns.
1424.Ip * 4 2
1425Shells interpret scripts a little bit at a time.
1426.I Perl
1427compiles the whole program before executing it.
1428.Ip * 4 2
1429The arguments are available via @ARGV, not $1, $2, etc.
1430.Ip * 4 2
1431The environment is not automatically made available as variables.
1432.SH BUGS
1434.I Perl
1435is at the mercy of your machine's definitions of various operations
1436such as type casting, atof() and sprintf().
1438If your stdio requires an seek or eof between reads and writes on a particular
1439stream, so does
1440.IR perl .
1442While none of the built-in data types have any arbitrary size limits (apart
1443from memory size), there are still a few arbitrary limits:
1444a given identifier may not be longer than 255 characters;
1445sprintf is limited on many machines to 128 characters per field (unless the format
1446specifier is exactly %s);
1447and no component of your PATH may be longer than 255 if you use \-S.
1449.I Perl
1450actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1451anyone I said that.
1452.rn }` ''