| 1 | =head1 NAME |
| 2 | |
| 3 | perlstyle - Perl style guide |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | Each programmer will, of course, have his or her own preferences in |
| 8 | regards to formatting, but there are some general guidelines that will |
| 9 | make your programs easier to read, understand, and maintain. |
| 10 | |
| 11 | The most important thing is to use L<strict> and L<warnings> in all your |
| 12 | code or know the reason why not to. You may turn them off explicitly for |
| 13 | particular portions of code via C<no warnings> or C<no strict>, and this |
| 14 | can be limited to the specific warnings or strict features you wish to |
| 15 | disable. The B<-w> flag and C<$^W> variable should not be used for this |
| 16 | purpose since they can affect code you use but did not write, such as |
| 17 | modules from core or CPAN. |
| 18 | |
| 19 | Regarding aesthetics of code lay out, about the only thing Larry |
| 20 | cares strongly about is that the closing curly bracket of |
| 21 | a multi-line BLOCK should line up with the keyword that started the construct. |
| 22 | Beyond that, he has other preferences that aren't so strong: |
| 23 | |
| 24 | =over 4 |
| 25 | |
| 26 | =item * |
| 27 | |
| 28 | 4-column indent. |
| 29 | |
| 30 | =item * |
| 31 | |
| 32 | Opening curly on same line as keyword, if possible, otherwise line up. |
| 33 | |
| 34 | =item * |
| 35 | |
| 36 | Space before the opening curly of a multi-line BLOCK. |
| 37 | |
| 38 | =item * |
| 39 | |
| 40 | One-line BLOCK may be put on one line, including curlies. |
| 41 | |
| 42 | =item * |
| 43 | |
| 44 | No space before the semicolon. |
| 45 | |
| 46 | =item * |
| 47 | |
| 48 | Semicolon omitted in "short" one-line BLOCK. |
| 49 | |
| 50 | =item * |
| 51 | |
| 52 | Space around most operators. |
| 53 | |
| 54 | =item * |
| 55 | |
| 56 | Space around a "complex" subscript (inside brackets). |
| 57 | |
| 58 | =item * |
| 59 | |
| 60 | Blank lines between chunks that do different things. |
| 61 | |
| 62 | =item * |
| 63 | |
| 64 | Uncuddled elses. |
| 65 | |
| 66 | =item * |
| 67 | |
| 68 | No space between function name and its opening parenthesis. |
| 69 | |
| 70 | =item * |
| 71 | |
| 72 | Space after each comma. |
| 73 | |
| 74 | =item * |
| 75 | |
| 76 | Long lines broken after an operator (except C<and> and C<or>). |
| 77 | |
| 78 | =item * |
| 79 | |
| 80 | Space after last parenthesis matching on current line. |
| 81 | |
| 82 | =item * |
| 83 | |
| 84 | Line up corresponding items vertically. |
| 85 | |
| 86 | =item * |
| 87 | |
| 88 | Omit redundant punctuation as long as clarity doesn't suffer. |
| 89 | |
| 90 | =back |
| 91 | |
| 92 | Larry has his reasons for each of these things, but he doesn't claim that |
| 93 | everyone else's mind works the same as his does. |
| 94 | |
| 95 | Here are some other more substantive style issues to think about: |
| 96 | |
| 97 | =over 4 |
| 98 | |
| 99 | =item * |
| 100 | |
| 101 | Just because you I<CAN> do something a particular way doesn't mean that |
| 102 | you I<SHOULD> do it that way. Perl is designed to give you several |
| 103 | ways to do anything, so consider picking the most readable one. For |
| 104 | instance |
| 105 | |
| 106 | open(my $fh, '<', $foo) || die "Can't open $foo: $!"; |
| 107 | |
| 108 | is better than |
| 109 | |
| 110 | die "Can't open $foo: $!" unless open(my $fh, '<', $foo); |
| 111 | |
| 112 | because the second way hides the main point of the statement in a |
| 113 | modifier. On the other hand |
| 114 | |
| 115 | print "Starting analysis\n" if $verbose; |
| 116 | |
| 117 | is better than |
| 118 | |
| 119 | $verbose && print "Starting analysis\n"; |
| 120 | |
| 121 | because the main point isn't whether the user typed B<-v> or not. |
| 122 | |
| 123 | Similarly, just because an operator lets you assume default arguments |
| 124 | doesn't mean that you have to make use of the defaults. The defaults |
| 125 | are there for lazy systems programmers writing one-shot programs. If |
| 126 | you want your program to be readable, consider supplying the argument. |
| 127 | |
| 128 | Along the same lines, just because you I<CAN> omit parentheses in many |
| 129 | places doesn't mean that you ought to: |
| 130 | |
| 131 | return print reverse sort num values %array; |
| 132 | return print(reverse(sort num (values(%array)))); |
| 133 | |
| 134 | When in doubt, parenthesize. At the very least it will let some poor |
| 135 | schmuck bounce on the % key in B<vi>. |
| 136 | |
| 137 | Even if you aren't in doubt, consider the mental welfare of the person |
| 138 | who has to maintain the code after you, and who will probably put |
| 139 | parentheses in the wrong place. |
| 140 | |
| 141 | =item * |
| 142 | |
| 143 | Don't go through silly contortions to exit a loop at the top or the |
| 144 | bottom, when Perl provides the C<last> operator so you can exit in |
| 145 | the middle. Just "outdent" it a little to make it more visible: |
| 146 | |
| 147 | LINE: |
| 148 | for (;;) { |
| 149 | statements; |
| 150 | last LINE if $foo; |
| 151 | next LINE if /^#/; |
| 152 | statements; |
| 153 | } |
| 154 | |
| 155 | =item * |
| 156 | |
| 157 | Don't be afraid to use loop labels--they're there to enhance |
| 158 | readability as well as to allow multilevel loop breaks. See the |
| 159 | previous example. |
| 160 | |
| 161 | =item * |
| 162 | |
| 163 | Avoid using C<grep()> (or C<map()>) or `backticks` in a void context, that is, |
| 164 | when you just throw away their return values. Those functions all |
| 165 | have return values, so use them. Otherwise use a C<foreach()> loop or |
| 166 | the C<system()> function instead. |
| 167 | |
| 168 | =item * |
| 169 | |
| 170 | For portability, when using features that may not be implemented on |
| 171 | every machine, test the construct in an eval to see if it fails. If |
| 172 | you know what version or patchlevel a particular feature was |
| 173 | implemented, you can test C<$]> (C<$PERL_VERSION> in C<English>) to see if it |
| 174 | will be there. The C<Config> module will also let you interrogate values |
| 175 | determined by the B<Configure> program when Perl was installed. |
| 176 | |
| 177 | =item * |
| 178 | |
| 179 | Choose mnemonic identifiers. If you can't remember what mnemonic means, |
| 180 | you've got a problem. |
| 181 | |
| 182 | =item * |
| 183 | |
| 184 | While short identifiers like C<$gotit> are probably ok, use underscores to |
| 185 | separate words in longer identifiers. It is generally easier to read |
| 186 | C<$var_names_like_this> than C<$VarNamesLikeThis>, especially for |
| 187 | non-native speakers of English. It's also a simple rule that works |
| 188 | consistently with C<VAR_NAMES_LIKE_THIS>. |
| 189 | |
| 190 | Package names are sometimes an exception to this rule. Perl informally |
| 191 | reserves lowercase module names for "pragma" modules like C<integer> and |
| 192 | C<strict>. Other modules should begin with a capital letter and use mixed |
| 193 | case, but probably without underscores due to limitations in primitive |
| 194 | file systems' representations of module names as files that must fit into a |
| 195 | few sparse bytes. |
| 196 | |
| 197 | =item * |
| 198 | |
| 199 | You may find it helpful to use letter case to indicate the scope |
| 200 | or nature of a variable. For example: |
| 201 | |
| 202 | $ALL_CAPS_HERE constants only (beware clashes with perl vars!) |
| 203 | $Some_Caps_Here package-wide global/static |
| 204 | $no_caps_here function scope my() or local() variables |
| 205 | |
| 206 | Function and method names seem to work best as all lowercase. |
| 207 | E.g., C<$obj-E<gt>as_string()>. |
| 208 | |
| 209 | You can use a leading underscore to indicate that a variable or |
| 210 | function should not be used outside the package that defined it. |
| 211 | |
| 212 | =item * |
| 213 | |
| 214 | If you have a really hairy regular expression, use the C</x> or C</xx> |
| 215 | modifiers and put in some whitespace to make it look a little less like |
| 216 | line noise. |
| 217 | Don't use slash as a delimiter when your regexp has slashes or backslashes. |
| 218 | |
| 219 | =item * |
| 220 | |
| 221 | Use the new C<and> and C<or> operators to avoid having to parenthesize |
| 222 | list operators so much, and to reduce the incidence of punctuation |
| 223 | operators like C<&&> and C<||>. Call your subroutines as if they were |
| 224 | functions or list operators to avoid excessive ampersands and parentheses. |
| 225 | |
| 226 | =item * |
| 227 | |
| 228 | Use here documents instead of repeated C<print()> statements. |
| 229 | |
| 230 | =item * |
| 231 | |
| 232 | Line up corresponding things vertically, especially if it'd be too long |
| 233 | to fit on one line anyway. |
| 234 | |
| 235 | $IDX = $ST_MTIME; |
| 236 | $IDX = $ST_ATIME if $opt_u; |
| 237 | $IDX = $ST_CTIME if $opt_c; |
| 238 | $IDX = $ST_SIZE if $opt_s; |
| 239 | |
| 240 | mkdir $tmpdir, 0700 or die "can't mkdir $tmpdir: $!"; |
| 241 | chdir($tmpdir) or die "can't chdir $tmpdir: $!"; |
| 242 | mkdir 'tmp', 0777 or die "can't mkdir $tmpdir/tmp: $!"; |
| 243 | |
| 244 | =item * |
| 245 | |
| 246 | Always check the return codes of system calls. Good error messages should |
| 247 | go to C<STDERR>, include which program caused the problem, what the failed |
| 248 | system call and arguments were, and (VERY IMPORTANT) should contain the |
| 249 | standard system error message for what went wrong. Here's a simple but |
| 250 | sufficient example: |
| 251 | |
| 252 | opendir(my $dh, $dir) or die "can't opendir $dir: $!"; |
| 253 | |
| 254 | =item * |
| 255 | |
| 256 | Line up your transliterations when it makes sense: |
| 257 | |
| 258 | tr [abc] |
| 259 | [xyz]; |
| 260 | |
| 261 | =item * |
| 262 | |
| 263 | Think about reusability. Why waste brainpower on a one-shot when you |
| 264 | might want to do something like it again? Consider generalizing your |
| 265 | code. Consider writing a module or object class. Consider making your |
| 266 | code run cleanly with C<use strict> and C<use warnings> in |
| 267 | effect. Consider giving away your code. Consider changing your whole |
| 268 | world view. Consider... oh, never mind. |
| 269 | |
| 270 | =item * |
| 271 | |
| 272 | Try to document your code and use Pod formatting in a consistent way. Here |
| 273 | are commonly expected conventions: |
| 274 | |
| 275 | =over 4 |
| 276 | |
| 277 | =item * |
| 278 | |
| 279 | use C<CE<lt>E<gt>> for function, variable and module names (and more |
| 280 | generally anything that can be considered part of code, like filehandles |
| 281 | or specific values). Note that function names are considered more readable |
| 282 | with parentheses after their name, that is C<function()>. |
| 283 | |
| 284 | =item * |
| 285 | |
| 286 | use C<BE<lt>E<gt>> for commands names like B<cat> or B<grep>. |
| 287 | |
| 288 | =item * |
| 289 | |
| 290 | use C<FE<lt>E<gt>> or C<CE<lt>E<gt>> for file names. C<FE<lt>E<gt>> should |
| 291 | be the only Pod code for file names, but as most Pod formatters render it |
| 292 | as italic, Unix and Windows paths with their slashes and backslashes may |
| 293 | be less readable, and better rendered with C<CE<lt>E<gt>>. |
| 294 | |
| 295 | =back |
| 296 | |
| 297 | =item * |
| 298 | |
| 299 | Be consistent. |
| 300 | |
| 301 | =item * |
| 302 | |
| 303 | Be nice. |
| 304 | |
| 305 | =back |