Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perlstyle - Perl style guide | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
a0d0e21e LW |
7 | Each programmer will, of course, have his or her own preferences in |
8 | regards to formatting, but there are some general guidelines that will | |
54310121 | 9 | make your programs easier to read, understand, and maintain. |
a0d0e21e | 10 | |
803b7faa DB |
11 | The most important thing is to use L<strict> and L<warnings> in all your |
12 | code or know the reason why not to. You may turn them off explicitly for | |
13 | particular portions of code via C<no warnings> or C<no strict>, and this | |
14 | can be limited to the specific warnings or strict features you wish to | |
15 | disable. The B<-w> flag and C<$^W> variable should not be used for this | |
16 | purpose since they can affect code you use but did not write, such as | |
17 | modules from core or CPAN. | |
cb1a09d0 | 18 | |
a0d0e21e | 19 | Regarding aesthetics of code lay out, about the only thing Larry |
d98d5fff | 20 | cares strongly about is that the closing curly bracket of |
4a6725af | 21 | a multi-line BLOCK should line up with the keyword that started the construct. |
a0d0e21e LW |
22 | Beyond that, he has other preferences that aren't so strong: |
23 | ||
24 | =over 4 | |
25 | ||
26 | =item * | |
27 | ||
28 | 4-column indent. | |
29 | ||
30 | =item * | |
31 | ||
32 | Opening curly on same line as keyword, if possible, otherwise line up. | |
33 | ||
34 | =item * | |
35 | ||
4a6725af | 36 | Space before the opening curly of a multi-line BLOCK. |
a0d0e21e LW |
37 | |
38 | =item * | |
39 | ||
40 | One-line BLOCK may be put on one line, including curlies. | |
41 | ||
42 | =item * | |
43 | ||
44 | No space before the semicolon. | |
45 | ||
46 | =item * | |
47 | ||
48 | Semicolon omitted in "short" one-line BLOCK. | |
49 | ||
50 | =item * | |
51 | ||
52 | Space around most operators. | |
53 | ||
54 | =item * | |
55 | ||
56 | Space around a "complex" subscript (inside brackets). | |
57 | ||
58 | =item * | |
59 | ||
60 | Blank lines between chunks that do different things. | |
61 | ||
62 | =item * | |
63 | ||
64 | Uncuddled elses. | |
65 | ||
66 | =item * | |
67 | ||
5f05dabc | 68 | No space between function name and its opening parenthesis. |
a0d0e21e LW |
69 | |
70 | =item * | |
71 | ||
72 | Space after each comma. | |
73 | ||
74 | =item * | |
75 | ||
b9ff9ac1 | 76 | Long lines broken after an operator (except C<and> and C<or>). |
a0d0e21e LW |
77 | |
78 | =item * | |
79 | ||
5f05dabc | 80 | Space after last parenthesis matching on current line. |
a0d0e21e LW |
81 | |
82 | =item * | |
83 | ||
84 | Line up corresponding items vertically. | |
85 | ||
86 | =item * | |
87 | ||
88 | Omit redundant punctuation as long as clarity doesn't suffer. | |
89 | ||
90 | =back | |
91 | ||
184e9718 | 92 | Larry has his reasons for each of these things, but he doesn't claim that |
a0d0e21e LW |
93 | everyone else's mind works the same as his does. |
94 | ||
95 | Here are some other more substantive style issues to think about: | |
96 | ||
97 | =over 4 | |
98 | ||
99 | =item * | |
100 | ||
101 | Just because you I<CAN> do something a particular way doesn't mean that | |
102 | you I<SHOULD> do it that way. Perl is designed to give you several | |
103 | ways to do anything, so consider picking the most readable one. For | |
104 | instance | |
105 | ||
3f1e98f5 | 106 | open(my $fh, '<', $foo) || die "Can't open $foo: $!"; |
a0d0e21e LW |
107 | |
108 | is better than | |
109 | ||
3f1e98f5 | 110 | die "Can't open $foo: $!" unless open(my $fh, '<', $foo); |
a0d0e21e LW |
111 | |
112 | because the second way hides the main point of the statement in a | |
113 | modifier. On the other hand | |
114 | ||
115 | print "Starting analysis\n" if $verbose; | |
116 | ||
117 | is better than | |
118 | ||
119 | $verbose && print "Starting analysis\n"; | |
120 | ||
5f05dabc | 121 | because the main point isn't whether the user typed B<-v> or not. |
a0d0e21e LW |
122 | |
123 | Similarly, just because an operator lets you assume default arguments | |
124 | doesn't mean that you have to make use of the defaults. The defaults | |
125 | are there for lazy systems programmers writing one-shot programs. If | |
126 | you want your program to be readable, consider supplying the argument. | |
127 | ||
128 | Along the same lines, just because you I<CAN> omit parentheses in many | |
129 | places doesn't mean that you ought to: | |
130 | ||
131 | return print reverse sort num values %array; | |
132 | return print(reverse(sort num (values(%array)))); | |
133 | ||
134 | When in doubt, parenthesize. At the very least it will let some poor | |
135 | schmuck bounce on the % key in B<vi>. | |
136 | ||
137 | Even if you aren't in doubt, consider the mental welfare of the person | |
138 | who has to maintain the code after you, and who will probably put | |
5f05dabc | 139 | parentheses in the wrong place. |
a0d0e21e LW |
140 | |
141 | =item * | |
142 | ||
143 | Don't go through silly contortions to exit a loop at the top or the | |
144 | bottom, when Perl provides the C<last> operator so you can exit in | |
145 | the middle. Just "outdent" it a little to make it more visible: | |
146 | ||
147 | LINE: | |
148 | for (;;) { | |
149 | statements; | |
150 | last LINE if $foo; | |
151 | next LINE if /^#/; | |
152 | statements; | |
153 | } | |
154 | ||
155 | =item * | |
156 | ||
157 | Don't be afraid to use loop labels--they're there to enhance | |
54310121 | 158 | readability as well as to allow multilevel loop breaks. See the |
a0d0e21e LW |
159 | previous example. |
160 | ||
161 | =item * | |
162 | ||
b9ff9ac1 | 163 | Avoid using C<grep()> (or C<map()>) or `backticks` in a void context, that is, |
54310121 | 164 | when you just throw away their return values. Those functions all |
b9ff9ac1 RGS |
165 | have return values, so use them. Otherwise use a C<foreach()> loop or |
166 | the C<system()> function instead. | |
c07a80fd | 167 | |
168 | =item * | |
169 | ||
a0d0e21e LW |
170 | For portability, when using features that may not be implemented on |
171 | every machine, test the construct in an eval to see if it fails. If | |
172 | you know what version or patchlevel a particular feature was | |
184e9718 | 173 | implemented, you can test C<$]> (C<$PERL_VERSION> in C<English>) to see if it |
a0d0e21e LW |
174 | will be there. The C<Config> module will also let you interrogate values |
175 | determined by the B<Configure> program when Perl was installed. | |
176 | ||
177 | =item * | |
178 | ||
179 | Choose mnemonic identifiers. If you can't remember what mnemonic means, | |
180 | you've got a problem. | |
181 | ||
54310121 | 182 | =item * |
cb1a09d0 | 183 | |
b9ff9ac1 RGS |
184 | While short identifiers like C<$gotit> are probably ok, use underscores to |
185 | separate words in longer identifiers. It is generally easier to read | |
186 | C<$var_names_like_this> than C<$VarNamesLikeThis>, especially for | |
187 | non-native speakers of English. It's also a simple rule that works | |
188 | consistently with C<VAR_NAMES_LIKE_THIS>. | |
cb1a09d0 AD |
189 | |
190 | Package names are sometimes an exception to this rule. Perl informally | |
191 | reserves lowercase module names for "pragma" modules like C<integer> and | |
192 | C<strict>. Other modules should begin with a capital letter and use mixed | |
193 | case, but probably without underscores due to limitations in primitive | |
5f05dabc | 194 | file systems' representations of module names as files that must fit into a |
54310121 | 195 | few sparse bytes. |
cb1a09d0 AD |
196 | |
197 | =item * | |
198 | ||
54310121 | 199 | You may find it helpful to use letter case to indicate the scope |
200 | or nature of a variable. For example: | |
cb1a09d0 | 201 | |
54310121 | 202 | $ALL_CAPS_HERE constants only (beware clashes with perl vars!) |
203 | $Some_Caps_Here package-wide global/static | |
204 | $no_caps_here function scope my() or local() variables | |
cb1a09d0 | 205 | |
54310121 | 206 | Function and method names seem to work best as all lowercase. |
b9ff9ac1 | 207 | E.g., C<$obj-E<gt>as_string()>. |
cb1a09d0 AD |
208 | |
209 | You can use a leading underscore to indicate that a variable or | |
210 | function should not be used outside the package that defined it. | |
211 | ||
a0d0e21e LW |
212 | =item * |
213 | ||
77c8f263 KW |
214 | If you have a really hairy regular expression, use the C</x> or C</xx> |
215 | modifiers and put in some whitespace to make it look a little less like | |
216 | line noise. | |
a0d0e21e LW |
217 | Don't use slash as a delimiter when your regexp has slashes or backslashes. |
218 | ||
219 | =item * | |
220 | ||
b9ff9ac1 | 221 | Use the new C<and> and C<or> operators to avoid having to parenthesize |
5f05dabc | 222 | list operators so much, and to reduce the incidence of punctuation |
a0d0e21e | 223 | operators like C<&&> and C<||>. Call your subroutines as if they were |
5f05dabc | 224 | functions or list operators to avoid excessive ampersands and parentheses. |
a0d0e21e LW |
225 | |
226 | =item * | |
227 | ||
b9ff9ac1 | 228 | Use here documents instead of repeated C<print()> statements. |
a0d0e21e LW |
229 | |
230 | =item * | |
231 | ||
232 | Line up corresponding things vertically, especially if it'd be too long | |
54310121 | 233 | to fit on one line anyway. |
a0d0e21e | 234 | |
54310121 | 235 | $IDX = $ST_MTIME; |
236 | $IDX = $ST_ATIME if $opt_u; | |
237 | $IDX = $ST_CTIME if $opt_c; | |
238 | $IDX = $ST_SIZE if $opt_s; | |
a0d0e21e LW |
239 | |
240 | mkdir $tmpdir, 0700 or die "can't mkdir $tmpdir: $!"; | |
241 | chdir($tmpdir) or die "can't chdir $tmpdir: $!"; | |
242 | mkdir 'tmp', 0777 or die "can't mkdir $tmpdir/tmp: $!"; | |
243 | ||
244 | =item * | |
245 | ||
cb1a09d0 | 246 | Always check the return codes of system calls. Good error messages should |
b9ff9ac1 | 247 | go to C<STDERR>, include which program caused the problem, what the failed |
7b8d334a | 248 | system call and arguments were, and (VERY IMPORTANT) should contain the |
cb1a09d0 AD |
249 | standard system error message for what went wrong. Here's a simple but |
250 | sufficient example: | |
251 | ||
3f1e98f5 | 252 | opendir(my $dh, $dir) or die "can't opendir $dir: $!"; |
cb1a09d0 AD |
253 | |
254 | =item * | |
255 | ||
2c268ad5 | 256 | Line up your transliterations when it makes sense: |
a0d0e21e LW |
257 | |
258 | tr [abc] | |
259 | [xyz]; | |
260 | ||
261 | =item * | |
262 | ||
263 | Think about reusability. Why waste brainpower on a one-shot when you | |
264 | might want to do something like it again? Consider generalizing your | |
265 | code. Consider writing a module or object class. Consider making your | |
803b7faa | 266 | code run cleanly with C<use strict> and C<use warnings> in |
0c506aae AT |
267 | effect. Consider giving away your code. Consider changing your whole |
268 | world view. Consider... oh, never mind. | |
a0d0e21e LW |
269 | |
270 | =item * | |
271 | ||
b9ff9ac1 RGS |
272 | Try to document your code and use Pod formatting in a consistent way. Here |
273 | are commonly expected conventions: | |
274 | ||
275 | =over 4 | |
276 | ||
277 | =item * | |
278 | ||
279 | use C<CE<lt>E<gt>> for function, variable and module names (and more | |
280 | generally anything that can be considered part of code, like filehandles | |
281 | or specific values). Note that function names are considered more readable | |
282 | with parentheses after their name, that is C<function()>. | |
283 | ||
284 | =item * | |
285 | ||
286 | use C<BE<lt>E<gt>> for commands names like B<cat> or B<grep>. | |
287 | ||
288 | =item * | |
289 | ||
290 | use C<FE<lt>E<gt>> or C<CE<lt>E<gt>> for file names. C<FE<lt>E<gt>> should | |
291 | be the only Pod code for file names, but as most Pod formatters render it | |
292 | as italic, Unix and Windows paths with their slashes and backslashes may | |
293 | be less readable, and better rendered with C<CE<lt>E<gt>>. | |
294 | ||
295 | =back | |
296 | ||
297 | =item * | |
298 | ||
a0d0e21e LW |
299 | Be consistent. |
300 | ||
301 | =item * | |
302 | ||
303 | Be nice. | |
304 | ||
305 | =back |