Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perltrap - Perl traps for the unwary | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | The biggest trap of all is forgetting to use the B<-w> switch; | |
8 | see L<perlrun>. Making your entire program runnable under | |
9 | ||
10 | use strict; | |
11 | ||
12 | can help make your program more bullet-proof, but sometimes | |
13 | it's too annoying for quick throw-away programs. | |
14 | ||
15 | =head2 Awk Traps | |
16 | ||
17 | Accustomed B<awk> users should take special note of the following: | |
18 | ||
19 | =over 4 | |
20 | ||
21 | =item * | |
22 | ||
23 | The English module, loaded via | |
24 | ||
25 | use English; | |
26 | ||
27 | allows you to refer to special variables (like $RS) as | |
28 | though they were in B<awk>; see L<perlvar> for details. | |
29 | ||
30 | =item * | |
31 | ||
32 | Semicolons are required after all simple statements in Perl (except | |
33 | at the end of a block). Newline is not a statement delimiter. | |
34 | ||
35 | =item * | |
36 | ||
37 | Curly brackets are required on C<if>s and C<while>s. | |
38 | ||
39 | =item * | |
40 | ||
41 | Variables begin with "$" or "@" in Perl. | |
42 | ||
43 | =item * | |
44 | ||
45 | Arrays index from 0. Likewise string positions in substr() and | |
46 | index(). | |
47 | ||
48 | =item * | |
49 | ||
50 | You have to decide whether your array has numeric or string indices. | |
51 | ||
52 | =item * | |
53 | ||
54 | Associative array values do not spring into existence upon mere | |
55 | reference. | |
56 | ||
57 | =item * | |
58 | ||
59 | You have to decide whether you want to use string or numeric | |
60 | comparisons. | |
61 | ||
62 | =item * | |
63 | ||
64 | Reading an input line does not split it for you. You get to split it | |
65 | yourself to an array. And split() operator has different | |
66 | arguments. | |
67 | ||
68 | =item * | |
69 | ||
70 | The current input line is normally in $_, not $0. It generally does | |
71 | not have the newline stripped. ($0 is the name of the program | |
72 | executed.) See L<perlvar>. | |
73 | ||
74 | =item * | |
75 | ||
76 | $<I<digit>> does not refer to fields--it refers to substrings matched by | |
77 | the last match pattern. | |
78 | ||
79 | =item * | |
80 | ||
81 | The print() statement does not add field and record separators unless | |
82 | you set C<$,> and C<$.>. You can set $OFS and $ORS if you're using | |
83 | the English module. | |
84 | ||
85 | =item * | |
86 | ||
87 | You must open your files before you print to them. | |
88 | ||
89 | =item * | |
90 | ||
91 | The range operator is "..", not comma. The comma operator works as in | |
92 | C. | |
93 | ||
94 | =item * | |
95 | ||
96 | The match operator is "=~", not "~". ("~" is the one's complement | |
97 | operator, as in C.) | |
98 | ||
99 | =item * | |
100 | ||
101 | The exponentiation operator is "**", not "^". "^" is the XOR | |
102 | operator, as in C. (You know, one could get the feeling that B<awk> is | |
103 | basically incompatible with C.) | |
104 | ||
105 | =item * | |
106 | ||
107 | The concatenation operator is ".", not the null string. (Using the | |
108 | null string would render C</pat/ /pat/> unparsable, since the third slash | |
109 | would be interpreted as a division operator--the tokener is in fact | |
110 | slightly context sensitive for operators like "/", "?", and ">". | |
111 | And in fact, "." itself can be the beginning of a number.) | |
112 | ||
113 | =item * | |
114 | ||
115 | The C<next>, C<exit>, and C<continue> keywords work differently. | |
116 | ||
117 | =item * | |
118 | ||
119 | ||
120 | The following variables work differently: | |
121 | ||
122 | Awk Perl | |
123 | ARGC $#ARGV or scalar @ARGV | |
124 | ARGV[0] $0 | |
125 | FILENAME $ARGV | |
126 | FNR $. - something | |
127 | FS (whatever you like) | |
128 | NF $#Fld, or some such | |
129 | NR $. | |
130 | OFMT $# | |
131 | OFS $, | |
132 | ORS $\ | |
133 | RLENGTH length($&) | |
134 | RS $/ | |
135 | RSTART length($`) | |
136 | SUBSEP $; | |
137 | ||
138 | =item * | |
139 | ||
140 | You cannot set $RS to a pattern, only a string. | |
141 | ||
142 | =item * | |
143 | ||
144 | When in doubt, run the B<awk> construct through B<a2p> and see what it | |
145 | gives you. | |
146 | ||
147 | =back | |
148 | ||
149 | =head2 C Traps | |
150 | ||
151 | Cerebral C programmers should take note of the following: | |
152 | ||
153 | =over 4 | |
154 | ||
155 | =item * | |
156 | ||
157 | Curly brackets are required on C<if>'s and C<while>'s. | |
158 | ||
159 | =item * | |
160 | ||
161 | You must use C<elsif> rather than C<else if>. | |
162 | ||
163 | =item * | |
164 | ||
165 | The C<break> and C<continue> keywords from C become in | |
166 | Perl C<last> and C<next>, respectively. | |
167 | Unlike in C, these do I<NOT> work within a C<do { } while> construct. | |
168 | ||
169 | =item * | |
170 | ||
171 | There's no switch statement. (But it's easy to build one on the fly.) | |
172 | ||
173 | =item * | |
174 | ||
175 | Variables begin with "$" or "@" in Perl. | |
176 | ||
177 | =item * | |
178 | ||
179 | printf() does not implement the "*" format for interpolating | |
180 | field widths, but it's trivial to use interpolation of double-quoted | |
181 | strings to achieve the same effect. | |
182 | ||
183 | =item * | |
184 | ||
185 | Comments begin with "#", not "/*". | |
186 | ||
187 | =item * | |
188 | ||
189 | You can't take the address of anything, although a similar operator | |
190 | in Perl 5 is the backslash, which creates a reference. | |
191 | ||
192 | =item * | |
193 | ||
194 | C<ARGV> must be capitalized. | |
195 | ||
196 | =item * | |
197 | ||
198 | System calls such as link(), unlink(), rename(), etc. return nonzero for | |
199 | success, not 0. | |
200 | ||
201 | =item * | |
202 | ||
203 | Signal handlers deal with signal names, not numbers. Use C<kill -l> | |
204 | to find their names on your system. | |
205 | ||
206 | =back | |
207 | ||
208 | =head2 Sed Traps | |
209 | ||
210 | Seasoned B<sed> programmers should take note of the following: | |
211 | ||
212 | =over 4 | |
213 | ||
214 | =item * | |
215 | ||
216 | Backreferences in substitutions use "$" rather than "\". | |
217 | ||
218 | =item * | |
219 | ||
220 | The pattern matching metacharacters "(", ")", and "|" do not have backslashes | |
221 | in front. | |
222 | ||
223 | =item * | |
224 | ||
225 | The range operator is C<...>, rather than comma. | |
226 | ||
227 | =back | |
228 | ||
229 | =head2 Shell Traps | |
230 | ||
231 | Sharp shell programmers should take note of the following: | |
232 | ||
233 | =over 4 | |
234 | ||
235 | =item * | |
236 | ||
237 | The backtick operator does variable interpretation without regard to | |
238 | the presence of single quotes in the command. | |
239 | ||
240 | =item * | |
241 | ||
242 | The backtick operator does no translation of the return value, unlike B<csh>. | |
243 | ||
244 | =item * | |
245 | ||
246 | Shells (especially B<csh>) do several levels of substitution on each | |
247 | command line. Perl does substitution only in certain constructs | |
248 | such as double quotes, backticks, angle brackets, and search patterns. | |
249 | ||
250 | =item * | |
251 | ||
252 | Shells interpret scripts a little bit at a time. Perl compiles the | |
253 | entire program before executing it (except for C<BEGIN> blocks, which | |
254 | execute at compile time). | |
255 | ||
256 | =item * | |
257 | ||
258 | The arguments are available via @ARGV, not $1, $2, etc. | |
259 | ||
260 | =item * | |
261 | ||
262 | The environment is not automatically made available as separate scalar | |
263 | variables. | |
264 | ||
265 | =back | |
266 | ||
267 | =head2 Perl Traps | |
268 | ||
269 | Practicing Perl Programmers should take note of the following: | |
270 | ||
271 | =over 4 | |
272 | ||
273 | =item * | |
274 | ||
275 | Remember that many operations behave differently in a list | |
276 | context than they do in a scalar one. See L<perldata> for details. | |
277 | ||
278 | =item * | |
279 | ||
280 | Avoid barewords if you can, especially all lower-case ones. | |
281 | You can't tell just by looking at it whether a bareword is | |
282 | a function or a string. By using quotes on strings and | |
283 | parens on function calls, you won't ever get them confused. | |
284 | ||
285 | =item * | |
286 | ||
287 | You cannot discern from mere inspection which built-ins | |
288 | are unary operators (like chop() and chdir()) | |
289 | and which are list operators (like print() and unlink()). | |
290 | (User-defined subroutines can B<only> be list operators, never | |
291 | unary ones.) See L<perlop>. | |
292 | ||
293 | =item * | |
294 | ||
295 | People have a hard type remembering that some functions | |
296 | default to $_, or @ARGV, or whatever, but that others which | |
297 | you might expect to do not. | |
298 | ||
299 | =item * | |
300 | ||
301 | Remember not to use "C<=>" when you need "C<=~>"; | |
302 | these two constructs are quite different: | |
303 | ||
304 | $x = /foo/; | |
305 | $x =~ /foo/; | |
306 | ||
307 | =item * | |
308 | ||
309 | The C<do {}> construct isn't a real loop that you can use | |
310 | loop control on. | |
311 | ||
312 | =item * | |
313 | ||
314 | Use my() for local variables whenever you can get away with | |
315 | it (but see L<perlform> for where you can't). | |
316 | Using local() actually gives a local value to a global | |
317 | variable, which leaves you open to unforeseen side-effects | |
318 | of dynamic scoping. | |
319 | ||
320 | =back | |
321 | ||
322 | =head2 Perl4 Traps | |
323 | ||
324 | Penitent Perl 4 Programmers should take note of the following | |
325 | incompatible changes that occurred between release 4 and release 5: | |
326 | ||
327 | =over 4 | |
328 | ||
329 | =item * | |
330 | ||
331 | C<@> now always interpolates an array in double-quotish strings. Some programs | |
332 | may now need to use backslash to protect any C<@> that shouldn't interpolate. | |
333 | ||
334 | =item * | |
335 | Barewords that used to look like strings to Perl will now look like subroutine | |
336 | calls if a subroutine by that name is defined before the compiler sees them. | |
337 | For example: | |
338 | ||
339 | sub SeeYa { die "Hasta la vista, baby!" } | |
340 | $SIG{QUIT} = SeeYa; | |
341 | ||
342 | In Perl 4, that set the signal handler; in Perl 5, it actually calls the | |
343 | function! You may use the B<-w> switch to find such places. | |
344 | ||
345 | =item * | |
346 | ||
347 | Symbols starting with C<_> are no longer forced into package C<main>, except | |
348 | for $_ itself (and @_, etc.). | |
349 | ||
350 | =item * | |
351 | ||
352 | C<s'$lhs'$rhs'> now does no interpolation on either side. It used to | |
353 | interpolate C<$lhs> but not C<$rhs>. | |
354 | ||
355 | =item * | |
356 | ||
357 | The second and third arguments of splice() are now evaluated in scalar | |
358 | context (as the book says) rather than list context. | |
359 | ||
360 | =item * | |
361 | ||
362 | These are now semantic errors because of precedence: | |
363 | ||
364 | shift @list + 20; | |
365 | $n = keys %map + 20; | |
366 | ||
367 | Because if that were to work, then this couldn't: | |
368 | ||
369 | sleep $dormancy + 20; | |
370 | ||
371 | =item * | |
372 | ||
373 | C<open FOO || die> is now incorrect. You need parens around the filehandle. | |
374 | While temporarily supported, using such a construct will | |
375 | generate a non-fatal (but non-suppressible) warning. | |
376 | ||
377 | =item * | |
378 | ||
379 | The elements of argument lists for formats are now evaluated in list | |
380 | context. This means you can interpolate list values now. | |
381 | ||
382 | =item * | |
383 | ||
384 | You can't do a C<goto> into a block that is optimized away. Darn. | |
385 | ||
386 | =item * | |
387 | ||
388 | It is no longer syntactically legal to use whitespace as the name | |
389 | of a variable, or as a delimiter for any kind of quote construct. | |
390 | Double darn. | |
391 | ||
392 | =item * | |
393 | ||
394 | The caller() function now returns a false value in a scalar context if there | |
395 | is no caller. This lets library files determine if they're being required. | |
396 | ||
397 | =item * | |
398 | ||
399 | C<m//g> now attaches its state to the searched string rather than the | |
400 | regular expression. | |
401 | ||
402 | =item * | |
403 | ||
404 | C<reverse> is no longer allowed as the name of a sort subroutine. | |
405 | ||
406 | =item * | |
407 | ||
408 | B<taintperl> is no longer a separate executable. There is now a B<-T> | |
409 | switch to turn on tainting when it isn't turned on automatically. | |
410 | ||
411 | =item * | |
412 | ||
413 | Double-quoted strings may no longer end with an unescaped C<$> or C<@>. | |
414 | ||
415 | =item * | |
416 | ||
417 | The archaic C<while/if> BLOCK BLOCK syntax is no longer supported. | |
418 | ||
419 | ||
420 | =item * | |
421 | ||
422 | Negative array subscripts now count from the end of the array. | |
423 | ||
424 | =item * | |
425 | ||
426 | The comma operator in a scalar context is now guaranteed to give a | |
427 | scalar context to its arguments. | |
428 | ||
429 | =item * | |
430 | ||
431 | The C<**> operator now binds more tightly than unary minus. | |
432 | It was documented to work this way before, but didn't. | |
433 | ||
434 | =item * | |
435 | ||
436 | Setting C<$#array> lower now discards array elements. | |
437 | ||
438 | =item * | |
439 | ||
440 | delete() is not guaranteed to return the old value for tie()d arrays, | |
441 | since this capability may be onerous for some modules to implement. | |
442 | ||
443 | =item * | |
444 | ||
445 | Some error messages will be different. | |
446 | ||
447 | =item * | |
448 | ||
449 | Some bugs may have been inadvertently removed. | |
450 | ||
451 | =back |