perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlsyn - Perl syntax
	4
	5	=head1 DESCRIPTION
	6
	7	A Perl script consists of a sequence of declarations and statements.
	8	The only things that need to be declared in Perl are report formats
	9	and subroutines. See the sections below for more information on those
	10	declarations. All uninitialized user-created objects are assumed to
	11	start with a C<null> or C<0> value until they are defined by some explicit
	12	operation such as assignment. (Though you can get warnings about the
	13	use of undefined values if you like.) The sequence of statements is
	14	executed just once, unlike in B<sed> and B<awk> scripts, where the
	15	sequence of statements is executed for each input line. While this means
	16	that you must explicitly loop over the lines of your input file (or
	17	files), it also means you have much more control over which files and
	18	which lines you look at. (Actually, I'm lying--it is possible to do an
	19	implicit loop with either the B<-n> or B<-p> switch. It's just not the
	20	mandatory default like it is in B<sed> and B<awk>.)
	21
	22	=head2 Declarations
	23
	24	Perl is, for the most part, a free-form language. (The only exception
	25	to this is format declarations, for obvious reasons.) Text from a
	26	C<"#"> character until the end of the line is a comment, and is
	27	ignored. If you attempt to use C</* */> C-style comments, it will be
	28	interpreted either as division or pattern matching, depending on the
	29	context, and C++ C<//> comments just look like a null regular
	30	expression, so don't do that.
	31
	32	A declaration can be put anywhere a statement can, but has no effect on
	33	the execution of the primary sequence of statements--declarations all
	34	take effect at compile time. Typically all the declarations are put at
	35	the beginning or the end of the script. However, if you're using
	36	lexically-scoped private variables created with C<my()>, you'll have to make sure
	37	your format or subroutine definition is within the same block scope
	38	as the my if you expect to be able to access those private variables.
	39
	40	Declaring a subroutine allows a subroutine name to be used as if it were a
	41	list operator from that point forward in the program. You can declare a
	42	subroutine without defining it by saying C<sub name>, thus:
	43
	44	sub myname;
	45	$me = myname $0 or die "can't get myname";
	46
	47	Note that my() functions as a list operator, not as a unary operator; so
	48	be careful to use C<or> instead of C<\|\|> in this case. However, if
	49	you were to declare the subroutine as C<sub myname ($)>, then
	50	C<myname> would function as a unary operator, so either C<or> or
	51	C<\|\|> would work.
	52
	53	Subroutines declarations can also be loaded up with the C<require> statement
	54	or both loaded and imported into your namespace with a C<use> statement.
	55	See L<perlmod> for details on this.
	56
	57	A statement sequence may contain declarations of lexically-scoped
	58	variables, but apart from declaring a variable name, the declaration acts
	59	like an ordinary statement, and is elaborated within the sequence of
	60	statements as if it were an ordinary statement. That means it actually
	61	has both compile-time and run-time effects.
	62
	63	=head2 Simple statements
	64
	65	The only kind of simple statement is an expression evaluated for its
	66	side effects. Every simple statement must be terminated with a
	67	semicolon, unless it is the final statement in a block, in which case
	68	the semicolon is optional. (A semicolon is still encouraged there if the
	69	block takes up more than one line, because you may eventually add another line.)
	70	Note that there are some operators like C<eval {}> and C<do {}> that look
	71	like compound statements, but aren't (they're just TERMs in an expression),
	72	and thus need an explicit termination if used as the last item in a statement.
	73
	74	Any simple statement may optionally be followed by a I<SINGLE> modifier,
	75	just before the terminating semicolon (or block ending). The possible
	76	modifiers are:
	77
	78	if EXPR
	79	unless EXPR
	80	while EXPR
	81	until EXPR
	82	foreach EXPR
	83
	84	The C<if> and C<unless> modifiers have the expected semantics,
	85	presuming you're a speaker of English. The C<foreach> modifier is an
	86	iterator: For each value in EXPR, it aliases C<$_> to the value and
	87	executes the statement. The C<while> and C<until> modifiers have the
	88	usual "C<while> loop" semantics (conditional evaluated first), except
	89	when applied to a C<do>-BLOCK (or to the deprecated C<do>-SUBROUTINE
	90	statement), in which case the block executes once before the
	91	conditional is evaluated. This is so that you can write loops like:
	92
	93	do {
	94	$line = <STDIN>;
	95	...
	96	} until $line eq ".\n";
	97
	98	See L<perlfunc/do>. Note also that the loop control statements described
	99	later will I<NOT> work in this construct, because modifiers don't take
	100	loop labels. Sorry. You can always put another block inside of it
	101	(for C<next>) or around it (for C<last>) to do that sort of thing.
	102	For C<next>, just double the braces:
	103
	104	do {{
	105	next if $x == $y;
	106	# do something here
	107	}} until $x++ > $z;
	108
	109	For C<last>, you have to be more elaborate:
	110
	111	LOOP: {
	112	do {
	113	last if $x = $y**2;
	114	# do something here
	115	} while $x++ <= $z;
	116	}
	117
	118	=head2 Compound statements
	119
	120	In Perl, a sequence of statements that defines a scope is called a block.
	121	Sometimes a block is delimited by the file containing it (in the case
	122	of a required file, or the program as a whole), and sometimes a block
	123	is delimited by the extent of a string (in the case of an eval).
	124
	125	But generally, a block is delimited by curly brackets, also known as braces.
	126	We will call this syntactic construct a BLOCK.
	127
	128	The following compound statements may be used to control flow:
	129
	130	if (EXPR) BLOCK
	131	if (EXPR) BLOCK else BLOCK
	132	if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
	133	LABEL while (EXPR) BLOCK
	134	LABEL while (EXPR) BLOCK continue BLOCK
	135	LABEL for (EXPR; EXPR; EXPR) BLOCK
	136	LABEL foreach VAR (LIST) BLOCK
	137	LABEL foreach VAR (LIST) BLOCK continue BLOCK
	138	LABEL BLOCK continue BLOCK
	139
	140	Note that, unlike C and Pascal, these are defined in terms of BLOCKs,
	141	not statements. This means that the curly brackets are I<required>--no
	142	dangling statements allowed. If you want to write conditionals without
	143	curly brackets there are several other ways to do it. The following
	144	all do the same thing:
	145
	146	if (!open(FOO)) { die "Can't open $FOO: $!"; }
	147	die "Can't open $FOO: $!" unless open(FOO);
	148	open(FOO) or die "Can't open $FOO: $!"; # FOO or bust!
	149	open(FOO) ? 'hi mom' : die "Can't open $FOO: $!";
	150	# a bit exotic, that last one
	151
	152	The C<if> statement is straightforward. Because BLOCKs are always
	153	bounded by curly brackets, there is never any ambiguity about which
	154	C<if> an C<else> goes with. If you use C<unless> in place of C<if>,
	155	the sense of the test is reversed.
	156
	157	The C<while> statement executes the block as long as the expression is
	158	true (does not evaluate to the null string (C<"">) or C<0> or C<"0")>. The LABEL is
	159	optional, and if present, consists of an identifier followed by a colon.
	160	The LABEL identifies the loop for the loop control statements C<next>,
	161	C<last>, and C<redo>. If the LABEL is omitted, the loop control statement
	162	refers to the innermost enclosing loop. This may include dynamically
	163	looking back your call-stack at run time to find the LABEL. Such
	164	desperate behavior triggers a warning if you use the B<-w> flag.
	165
	166	If there is a C<continue> BLOCK, it is always executed just before the
	167	conditional is about to be evaluated again, just like the third part of a
	168	C<for> loop in C. Thus it can be used to increment a loop variable, even
	169	when the loop has been continued via the C<next> statement (which is
	170	similar to the C C<continue> statement).
	171
	172	=head2 Loop Control
	173
	174	The C<next> command is like the C<continue> statement in C; it starts
	175	the next iteration of the loop:
	176
	177	LINE: while (<STDIN>) {
	178	next LINE if /^#/; # discard comments
	179	...
	180	}
	181
	182	The C<last> command is like the C<break> statement in C (as used in
	183	loops); it immediately exits the loop in question. The
	184	C<continue> block, if any, is not executed:
	185
	186	LINE: while (<STDIN>) {
	187	last LINE if /^$/; # exit when done with header
	188	...
	189	}
	190
	191	The C<redo> command restarts the loop block without evaluating the
	192	conditional again. The C<continue> block, if any, is I<not> executed.
	193	This command is normally used by programs that want to lie to themselves
	194	about what was just input.
	195
	196	For example, when processing a file like F</etc/termcap>.
	197	If your input lines might end in backslashes to indicate continuation, you
	198	want to skip ahead and get the next record.
	199
	200	while (<>) {
	201	chomp;
	202	if (s/\\$//) {
	203	$_ .= <>;
	204	redo unless eof();
	205	}
	206	# now process $_
	207	}
	208
	209	which is Perl short-hand for the more explicitly written version:
	210
	211	LINE: while (defined($line = <ARGV>)) {
	212	chomp($line);
	213	if ($line =~ s/\\$//) {
	214	$line .= <ARGV>;
	215	redo LINE unless eof(); # not eof(ARGV)!
	216	}
	217	# now process $line
	218	}
	219
	220	Note that if there were a C<continue> block on the above code, it would get
	221	executed even on discarded lines. This is often used to reset line counters
	222	or C<?pat?> one-time matches.
	223
	224	# inspired by :1,$g/fred/s//WILMA/
	225	while (<>) {
	226	?(fred)? && s//WILMA $1 WILMA/;
	227	?(barney)? && s//BETTY $1 BETTY/;
	228	?(homer)? && s//MARGE $1 MARGE/;
	229	} continue {
	230	print "$ARGV $.: $_";
	231	close ARGV if eof(); # reset $.
	232	reset if eof(); # reset ?pat?
	233	}
	234
	235	If the word C<while> is replaced by the word C<until>, the sense of the
	236	test is reversed, but the conditional is still tested before the first
	237	iteration.
	238
	239	The loop control statements don't work in an C<if> or C<unless>, since
	240	they aren't loops. You can double the braces to make them such, though.
	241
	242	if (/pattern/) {{
	243	next if /fred/;
	244	next if /barney/;
	245	# so something here
	246	}}
	247
	248	The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer
	249	available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>.
	250
	251	=head2 For Loops
	252
	253	Perl's C-style C<for> loop works exactly like the corresponding C<while> loop;
	254	that means that this:
	255
	256	for ($i = 1; $i < 10; $i++) {
	257	...
	258	}
	259
	260	is the same as this:
	261
	262	$i = 1;
	263	while ($i < 10) {
	264	...
	265	} continue {
	266	$i++;
	267	}
	268
	269	(There is one minor difference: The first form implies a lexical scope
	270	for variables declared with C<my> in the initialization expression.)
	271
	272	Besides the normal array index looping, C<for> can lend itself
	273	to many other interesting applications. Here's one that avoids the
	274	problem you get into if you explicitly test for end-of-file on
	275	an interactive file descriptor causing your program to appear to
	276	hang.
	277
	278	$on_a_tty = -t STDIN && -t STDOUT;
	279	sub prompt { print "yes? " if $on_a_tty }
	280	for ( prompt(); <STDIN>; prompt() ) {
	281	# do something
	282	}
	283
	284	=head2 Foreach Loops
	285
	286	The C<foreach> loop iterates over a normal list value and sets the
	287	variable VAR to be each element of the list in turn. If the variable
	288	is preceded with the keyword C<my>, then it is lexically scoped, and
	289	is therefore visible only within the loop. Otherwise, the variable is
	290	implicitly local to the loop and regains its former value upon exiting
	291	the loop. If the variable was previously declared with C<my>, it uses
	292	that variable instead of the global one, but it's still localized to
	293	the loop.
	294
	295	The C<foreach> keyword is actually a synonym for the C<for> keyword, so
	296	you can use C<foreach> for readability or C<for> for brevity. (Or because
	297	the Bourne shell is more familiar to you than I<csh>, so writing C<for>
	298	comes more naturally.) If VAR is omitted, C<$_> is set to each value.
	299	If any element of LIST is an lvalue, you can modify it by modifying VAR
	300	inside the loop. That's because the C<foreach> loop index variable is
	301	an implicit alias for each item in the list that you're looping over.
	302
	303	If any part of LIST is an array, C<foreach> will get very confused if
	304	you add or remove elements within the loop body, for example with
	305	C<splice>. So don't do that.
	306
	307	C<foreach> probably won't do what you expect if VAR is a tied or other
	308	special variable. Don't do that either.
	309
	310	Examples:
	311
	312	for (@ary) { s/foo/bar/ }
	313
	314	foreach my $elem (@elements) {
	315	$elem *= 2;
	316	}
	317
	318	for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') {
	319	print $count, "\n"; sleep(1);
	320	}
	321
	322	for (1..15) { print "Merry Christmas\n"; }
	323
	324	foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) {
	325	print "Item: $item\n";
	326	}
	327
	328	Here's how a C programmer might code up a particular algorithm in Perl:
	329
	330	for (my $i = 0; $i < @ary1; $i++) {
	331	for (my $j = 0; $j < @ary2; $j++) {
	332	if ($ary1[$i] > $ary2[$j]) {
	333	last; # can't go to outer :-(
	334	}
	335	$ary1[$i] += $ary2[$j];
	336	}
	337	# this is where that last takes me
	338	}
	339
	340	Whereas here's how a Perl programmer more comfortable with the idiom might
	341	do it:
	342
	343	OUTER: foreach my $wid (@ary1) {
	344	INNER: foreach my $jet (@ary2) {
	345	next OUTER if $wid > $jet;
	346	$wid += $jet;
	347	}
	348	}
	349
	350	See how much easier this is? It's cleaner, safer, and faster. It's
	351	cleaner because it's less noisy. It's safer because if code gets added
	352	between the inner and outer loops later on, the new code won't be
	353	accidentally executed. The C<next> explicitly iterates the other loop
	354	rather than merely terminating the inner one. And it's faster because
	355	Perl executes a C<foreach> statement more rapidly than it would the
	356	equivalent C<for> loop.
	357
	358	=head2 Basic BLOCKs and Switch Statements
	359
	360	A BLOCK by itself (labeled or not) is semantically equivalent to a
	361	loop that executes once. Thus you can use any of the loop control
	362	statements in it to leave or restart the block. (Note that this is
	363	I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief
	364	C<do{}> blocks, which do I<NOT> count as loops.) The C<continue>
	365	block is optional.
	366
	367	The BLOCK construct is particularly nice for doing case
	368	structures.
	369
	370	SWITCH: {
	371	if (/^abc/) { $abc = 1; last SWITCH; }
	372	if (/^def/) { $def = 1; last SWITCH; }
	373	if (/^xyz/) { $xyz = 1; last SWITCH; }
	374	$nothing = 1;
	375	}
	376
	377	There is no official C<switch> statement in Perl, because there are
	378	already several ways to write the equivalent. In addition to the
	379	above, you could write
	380
	381	SWITCH: {
	382	$abc = 1, last SWITCH if /^abc/;
	383	$def = 1, last SWITCH if /^def/;
	384	$xyz = 1, last SWITCH if /^xyz/;
	385	$nothing = 1;
	386	}
	387
	388	(That's actually not as strange as it looks once you realize that you can
	389	use loop control "operators" within an expression, That's just the normal
	390	C comma operator.)
	391
	392	or
	393
	394	SWITCH: {
	395	/^abc/ && do { $abc = 1; last SWITCH; };
	396	/^def/ && do { $def = 1; last SWITCH; };
	397	/^xyz/ && do { $xyz = 1; last SWITCH; };
	398	$nothing = 1;
	399	}
	400
	401	or formatted so it stands out more as a "proper" C<switch> statement:
	402
	403	SWITCH: {
	404	/^abc/ && do {
	405	$abc = 1;
	406	last SWITCH;
	407	};
	408
	409	/^def/ && do {
	410	$def = 1;
	411	last SWITCH;
	412	};
	413
	414	/^xyz/ && do {
	415	$xyz = 1;
	416	last SWITCH;
	417	};
	418	$nothing = 1;
	419	}
	420
	421	or
	422
	423	SWITCH: {
	424	/^abc/ and $abc = 1, last SWITCH;
	425	/^def/ and $def = 1, last SWITCH;
	426	/^xyz/ and $xyz = 1, last SWITCH;
	427	$nothing = 1;
	428	}
	429
	430	or even, horrors,
	431
	432	if (/^abc/)
	433	{ $abc = 1 }
	434	elsif (/^def/)
	435	{ $def = 1 }
	436	elsif (/^xyz/)
	437	{ $xyz = 1 }
	438	else
	439	{ $nothing = 1 }
	440
	441	A common idiom for a C<switch> statement is to use C<foreach>'s aliasing to make
	442	a temporary assignment to C<$_> for convenient matching:
	443
	444	SWITCH: for ($where) {
	445	/In Card Names/ && do { push @flags, '-e'; last; };
	446	/Anywhere/ && do { push @flags, '-h'; last; };
	447	/In Rulings/ && do { last; };
	448	die "unknown value for form variable where: `$where'";
	449	}
	450
	451	Another interesting approach to a switch statement is arrange
	452	for a C<do> block to return the proper value:
	453
	454	$amode = do {
	455	if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0?
	456	elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }
	457	elsif ($flag & O_RDWR) {
	458	if ($flag & O_CREAT) { "w+" }
	459	else { ($flag & O_APPEND) ? "a+" : "r+" }
	460	}
	461	};
	462
	463	Or
	464
	465	print do {
	466	($flags & O_WRONLY) ? "write-only" :
	467	($flags & O_RDWR) ? "read-write" :
	468	"read-only";
	469	};
	470
	471	Or if you are certainly that all the C<&&> clauses are true, you can use
	472	something like this, which "switches" on the value of the
	473	C<HTTP_USER_AGENT> envariable.
	474
	475	#!/usr/bin/perl
	476	# pick out jargon file page based on browser
	477	$dir = 'http://www.wins.uva.nl/~mes/jargon';
	478	for ($ENV{HTTP_USER_AGENT}) {
	479	$page = /Mac/ && 'm/Macintrash.html'
	480	\|\| /Win(dows )?NT/ && 'e/evilandrude.html'
	481	\|\| /Win\|MSIE\|WebTV/ && 'm/MicroslothWindows.html'
	482	\|\| /Linux/ && 'l/Linux.html'
	483	\|\| /HP-UX/ && 'h/HP-SUX.html'
	484	\|\| /SunOS/ && 's/ScumOS.html'
	485	\|\| 'a/AppendixB.html';
	486	}
	487	print "Location: $dir/$page\015\012\015\012";
	488
	489	That kind of switch statement only works when you know the C<&&> clauses
	490	will be true. If you don't, the previous C<?:> example should be used.
	491
	492	You might also consider writing a hash of subroutine references
	493	instead of synthesizing a C<switch> statement.
	494
	495	=head2 Goto
	496
	497	Although not for the faint of heart, Perl does support a C<goto>
	498	statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and
	499	C<goto>-&NAME. A loop's LABEL is not actually a valid target for
	500	a C<goto>; it's just the name of the loop.
	501
	502	The C<goto>-LABEL form finds the statement labeled with LABEL and resumes
	503	execution there. It may not be used to go into any construct that
	504	requires initialization, such as a subroutine or a C<foreach> loop. It
	505	also can't be used to go into a construct that is optimized away. It
	506	can be used to go almost anywhere else within the dynamic scope,
	507	including out of subroutines, but it's usually better to use some other
	508	construct such as C<last> or C<die>. The author of Perl has never felt the
	509	need to use this form of C<goto> (in Perl, that is--C is another matter).
	510
	511	The C<goto>-EXPR form expects a label name, whose scope will be resolved
	512	dynamically. This allows for computed C<goto>s per FORTRAN, but isn't
	513	necessarily recommended if you're optimizing for maintainability:
	514
	515	goto ("FOO", "BAR", "GLARCH")[$i];
	516
	517	The C<goto>-&NAME form is highly magical, and substitutes a call to the
	518	named subroutine for the currently running subroutine. This is used by
	519	C<AUTOLOAD()> subroutines that wish to load another subroutine and then
	520	pretend that the other subroutine had been called in the first place
	521	(except that any modifications to C<@_> in the current subroutine are
	522	propagated to the other subroutine.) After the C<goto>, not even C<caller()>
	523	will be able to tell that this routine was called first.
	524
	525	In almost all cases like this, it's usually a far, far better idea to use the
	526	structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of
	527	resorting to a C<goto>. For certain applications, the catch and throw pair of
	528	C<eval{}> and die() for exception processing can also be a prudent approach.
	529
	530	=head2 PODs: Embedded Documentation
	531
	532	Perl has a mechanism for intermixing documentation with source code.
	533	While it's expecting the beginning of a new statement, if the compiler
	534	encounters a line that begins with an equal sign and a word, like this
	535
	536	=head1 Here There Be Pods!
	537
	538	Then that text and all remaining text up through and including a line
	539	beginning with C<=cut> will be ignored. The format of the intervening
	540	text is described in L<perlpod>.
	541
	542	This allows you to intermix your source code
	543	and your documentation text freely, as in
	544
	545	=item snazzle($)
	546
	547	The snazzle() function will behave in the most spectacular
	548	form that you can possibly imagine, not even excepting
	549	cybernetic pyrotechnics.
	550
	551	=cut back to the compiler, nuff of this pod stuff!
	552
	553	sub snazzle($) {
	554	my $thingie = shift;
	555	.........
	556	}
	557
	558	Note that pod translators should look at only paragraphs beginning
	559	with a pod directive (it makes parsing easier), whereas the compiler
	560	actually knows to look for pod escapes even in the middle of a
	561	paragraph. This means that the following secret stuff will be
	562	ignored by both the compiler and the translators.
	563
	564	$a=3;
	565	=secret stuff
	566	warn "Neither POD nor CODE!?"
	567	=cut back
	568	print "got $a\n";
	569
	570	You probably shouldn't rely upon the C<warn()> being podded out forever.
	571	Not all pod translators are well-behaved in this regard, and perhaps
	572	the compiler will become pickier.
	573
	574	One may also use pod directives to quickly comment out a section
	575	of code.
	576
	577	=head2 Plain Old Comments (Not!)
	578
	579	Much like the C preprocessor, Perl can process line directives. Using
	580	this, one can control Perl's idea of filenames and line numbers in
	581	error or warning messages (especially for strings that are processed
	582	with C<eval()>). The syntax for this mechanism is the same as for most
	583	C preprocessors: it matches the regular expression
	584	C</^#\sline\s+(\d+)\s(?:\s"([^"]*)")?/> with C<$1> being the line
	585	number for the next line, and C<$2> being the optional filename
	586	(specified within quotes).
	587
	588	Here are some examples that you should be able to type into your command
	589	shell:
	590
	591	% perl
	592	# line 200 "bzzzt"
	593	# the `#' on the previous line must be the first char on line
	594	die 'foo';
	595	__END__
	596	foo at bzzzt line 201.
	597
	598	% perl
	599	# line 200 "bzzzt"
	600	eval qq[\n#line 2001 ""\ndie 'foo']; print $@;
	601	__END__
	602	foo at - line 2001.
	603
	604	% perl
	605	eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@;
	606	__END__
	607	foo at foo bar line 200.
	608
	609	% perl
	610	# line 345 "goop"
	611	eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'";
	612	print $@;
	613	__END__
	614	foo at goop line 345.
	615
	616	=cut