Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perlsec - Perl security | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
425e5e39 | 7 | Perl is designed to make it easy to program securely even when running |
8 | with extra privileges, like setuid or setgid programs. Unlike most | |
54310121 | 9 | command line shells, which are based on multiple substitution passes on |
425e5e39 | 10 | each line of the script, Perl uses a more conventional evaluation scheme |
11 | with fewer hidden snags. Additionally, because the language has more | |
54310121 | 12 | builtin functionality, it can rely less upon external (and possibly |
425e5e39 | 13 | untrustworthy) programs to accomplish its purposes. |
a0d0e21e | 14 | |
425e5e39 | 15 | Perl automatically enables a set of special security checks, called I<taint |
16 | mode>, when it detects its program running with differing real and effective | |
17 | user or group IDs. The setuid bit in Unix permissions is mode 04000, the | |
18 | setgid bit mode 02000; either or both may be set. You can also enable taint | |
5f05dabc | 19 | mode explicitly by using the B<-T> command line flag. This flag is |
425e5e39 | 20 | I<strongly> suggested for server programs and any program run on behalf of |
fb73857a | 21 | someone else, such as a CGI script. Once taint mode is on, it's on for |
22 | the remainder of your script. | |
a0d0e21e | 23 | |
1e422769 | 24 | While in this mode, Perl takes special precautions called I<taint |
25 | checks> to prevent both obvious and subtle traps. Some of these checks | |
26 | are reasonably simple, such as verifying that path directories aren't | |
27 | writable by others; careful programmers have always used checks like | |
28 | these. Other checks, however, are best supported by the language itself, | |
fb73857a | 29 | and it is these checks especially that contribute to making a set-id Perl |
425e5e39 | 30 | program more secure than the corresponding C program. |
31 | ||
fb73857a | 32 | You may not use data derived from outside your program to affect |
33 | something else outside your program--at least, not by accident. All | |
34 | command line arguments, environment variables, locale information (see | |
23634c10 AL |
35 | L<perllocale>), results of certain system calls (C<readdir()>, |
36 | C<readlink()>, the variable of C<shmread()>, the messages returned by | |
37 | C<msgrcv()>, the password, gcos and shell fields returned by the | |
38 | C<getpwxxx()> calls), and all file input are marked as "tainted". | |
41d6edb2 JH |
39 | Tainted data may not be used directly or indirectly in any command |
40 | that invokes a sub-shell, nor in any command that modifies files, | |
b7ee89ce AP |
41 | directories, or processes, B<with the following exceptions>: |
42 | ||
43 | =over 4 | |
44 | ||
45 | =item * | |
46 | ||
b7ee89ce AP |
47 | Arguments to C<print> and C<syswrite> are B<not> checked for taintedness. |
48 | ||
7f6513c1 JH |
49 | =item * |
50 | ||
51 | Symbolic methods | |
52 | ||
53 | $obj->$method(@args); | |
54 | ||
55 | and symbolic sub references | |
56 | ||
57 | &{$foo}(@args); | |
58 | $foo->(@args); | |
59 | ||
60 | are not checked for taintedness. This requires extra carefulness | |
61 | unless you want external data to affect your control flow. Unless | |
62 | you carefully limit what these symbolic values are, people are able | |
63 | to call functions B<outside> your Perl code, such as POSIX::system, | |
64 | in which case they are able to run arbitrary external code. | |
65 | ||
b7ee89ce AP |
66 | =back |
67 | ||
595bde10 MG |
68 | For efficiency reasons, Perl takes a conservative view of |
69 | whether data is tainted. If an expression contains tainted data, | |
70 | any subexpression may be considered tainted, even if the value | |
71 | of the subexpression is not itself affected by the tainted data. | |
ee556d55 | 72 | |
d929ce6f | 73 | Because taintedness is associated with each scalar value, some |
595bde10 MG |
74 | elements of an array or hash can be tainted and others not. |
75 | The keys of a hash are never tainted. | |
a0d0e21e | 76 | |
a0d0e21e LW |
77 | For example: |
78 | ||
425e5e39 | 79 | $arg = shift; # $arg is tainted |
80 | $hid = $arg, 'bar'; # $hid is also tainted | |
81 | $line = <>; # Tainted | |
8ebc5c01 | 82 | $line = <STDIN>; # Also tainted |
83 | open FOO, "/home/me/bar" or die $!; | |
84 | $line = <FOO>; # Still tainted | |
a0d0e21e | 85 | $path = $ENV{'PATH'}; # Tainted, but see below |
425e5e39 | 86 | $data = 'abc'; # Not tainted |
a0d0e21e | 87 | |
425e5e39 | 88 | system "echo $arg"; # Insecure |
7de90c4d | 89 | system "/bin/echo", $arg; # Considered insecure |
bbd7eb8a | 90 | # (Perl doesn't know about /bin/echo) |
425e5e39 | 91 | system "echo $hid"; # Insecure |
92 | system "echo $data"; # Insecure until PATH set | |
a0d0e21e | 93 | |
425e5e39 | 94 | $path = $ENV{'PATH'}; # $path now tainted |
a0d0e21e | 95 | |
54310121 | 96 | $ENV{'PATH'} = '/bin:/usr/bin'; |
c90c0ff4 | 97 | delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'}; |
a0d0e21e | 98 | |
425e5e39 | 99 | $path = $ENV{'PATH'}; # $path now NOT tainted |
100 | system "echo $data"; # Is secure now! | |
a0d0e21e | 101 | |
425e5e39 | 102 | open(FOO, "< $arg"); # OK - read-only file |
103 | open(FOO, "> $arg"); # Not OK - trying to write | |
a0d0e21e | 104 | |
bbd7eb8a | 105 | open(FOO,"echo $arg|"); # Not OK |
425e5e39 | 106 | open(FOO,"-|") |
7de90c4d | 107 | or exec 'echo', $arg; # Also not OK |
a0d0e21e | 108 | |
425e5e39 | 109 | $shout = `echo $arg`; # Insecure, $shout now tainted |
a0d0e21e | 110 | |
425e5e39 | 111 | unlink $data, $arg; # Insecure |
112 | umask $arg; # Insecure | |
a0d0e21e | 113 | |
bbd7eb8a | 114 | exec "echo $arg"; # Insecure |
7de90c4d RD |
115 | exec "echo", $arg; # Insecure |
116 | exec "sh", '-c', $arg; # Very insecure! | |
a0d0e21e | 117 | |
3a4b19e4 GS |
118 | @files = <*.c>; # insecure (uses readdir() or similar) |
119 | @files = glob('*.c'); # insecure (uses readdir() or similar) | |
7bac28a0 | 120 | |
3f7d42d8 JH |
121 | # In Perl releases older than 5.6.0 the <*.c> and glob('*.c') would |
122 | # have used an external program to do the filename expansion; but in | |
123 | # either case the result is tainted since the list of filenames comes | |
124 | # from outside of the program. | |
125 | ||
ee556d55 MG |
126 | $bad = ($arg, 23); # $bad will be tainted |
127 | $arg, `true`; # Insecure (although it isn't really) | |
128 | ||
a0d0e21e | 129 | If you try to do something insecure, you will get a fatal error saying |
7de90c4d | 130 | something like "Insecure dependency" or "Insecure $ENV{PATH}". |
425e5e39 | 131 | |
23634c10 AL |
132 | The exception to the principle of "one tainted value taints the whole |
133 | expression" is with the ternary conditional operator C<?:>. Since code | |
134 | with a ternary conditional | |
135 | ||
136 | $result = $tainted_value ? "Untainted" : "Also untainted"; | |
137 | ||
138 | is effectively | |
139 | ||
140 | if ( $tainted_value ) { | |
141 | $result = "Untainted"; | |
142 | } else { | |
143 | $result = "Also untainted"; | |
144 | } | |
145 | ||
146 | it doesn't make sense for C<$result> to be tainted. | |
147 | ||
425e5e39 | 148 | =head2 Laundering and Detecting Tainted Data |
149 | ||
3f7d42d8 JH |
150 | To test whether a variable contains tainted data, and whose use would |
151 | thus trigger an "Insecure dependency" message, you can use the | |
23634c10 | 152 | C<tainted()> function of the Scalar::Util module, available in your |
3f7d42d8 | 153 | nearby CPAN mirror, and included in Perl starting from the release 5.8.0. |
595bde10 | 154 | Or you may be able to use the following C<is_tainted()> function. |
425e5e39 | 155 | |
156 | sub is_tainted { | |
61890e45 | 157 | return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 }; |
425e5e39 | 158 | } |
159 | ||
160 | This function makes use of the fact that the presence of tainted data | |
161 | anywhere within an expression renders the entire expression tainted. It | |
162 | would be inefficient for every operator to test every argument for | |
163 | taintedness. Instead, the slightly more efficient and conservative | |
164 | approach is used that if any tainted value has been accessed within the | |
165 | same expression, the whole expression is considered tainted. | |
166 | ||
5f05dabc | 167 | But testing for taintedness gets you only so far. Sometimes you have just |
595bde10 MG |
168 | to clear your data's taintedness. Values may be untainted by using them |
169 | as keys in a hash; otherwise the only way to bypass the tainting | |
54310121 | 170 | mechanism is by referencing subpatterns from a regular expression match. |
425e5e39 | 171 | Perl presumes that if you reference a substring using $1, $2, etc., that |
172 | you knew what you were doing when you wrote the pattern. That means using | |
173 | a bit of thought--don't just blindly untaint anything, or you defeat the | |
a034a98d DD |
174 | entire mechanism. It's better to verify that the variable has only good |
175 | characters (for certain values of "good") rather than checking whether it | |
176 | has any bad characters. That's because it's far too easy to miss bad | |
177 | characters that you never thought of. | |
425e5e39 | 178 | |
179 | Here's a test to make sure that the data contains nothing but "word" | |
180 | characters (alphabetics, numerics, and underscores), a hyphen, an at sign, | |
181 | or a dot. | |
182 | ||
54310121 | 183 | if ($data =~ /^([-\@\w.]+)$/) { |
425e5e39 | 184 | $data = $1; # $data now untainted |
185 | } else { | |
3a2263fe | 186 | die "Bad data in '$data'"; # log this somewhere |
425e5e39 | 187 | } |
188 | ||
5f05dabc | 189 | This is fairly secure because C</\w+/> doesn't normally match shell |
425e5e39 | 190 | metacharacters, nor are dot, dash, or at going to mean something special |
191 | to the shell. Use of C</.+/> would have been insecure in theory because | |
192 | it lets everything through, but Perl doesn't check for that. The lesson | |
193 | is that when untainting, you must be exceedingly careful with your patterns. | |
19799a22 | 194 | Laundering data using regular expression is the I<only> mechanism for |
425e5e39 | 195 | untainting dirty data, unless you use the strategy detailed below to fork |
196 | a child of lesser privilege. | |
197 | ||
23634c10 | 198 | The example does not untaint C<$data> if C<use locale> is in effect, |
a034a98d DD |
199 | because the characters matched by C<\w> are determined by the locale. |
200 | Perl considers that locale definitions are untrustworthy because they | |
201 | contain data from outside the program. If you are writing a | |
202 | locale-aware program, and want to launder data with a regular expression | |
203 | containing C<\w>, put C<no locale> ahead of the expression in the same | |
204 | block. See L<perllocale/SECURITY> for further discussion and examples. | |
205 | ||
3a52c276 CS |
206 | =head2 Switches On the "#!" Line |
207 | ||
208 | When you make a script executable, in order to make it usable as a | |
209 | command, the system will pass switches to perl from the script's #! | |
54310121 | 210 | line. Perl checks that any command line switches given to a setuid |
3a52c276 | 211 | (or setgid) script actually match the ones set on the #! line. Some |
54310121 | 212 | Unix and Unix-like environments impose a one-switch limit on the #! |
3a52c276 | 213 | line, so you may need to use something like C<-wU> instead of C<-w -U> |
54310121 | 214 | under such systems. (This issue should arise only in Unix or |
215 | Unix-like environments that support #! and setuid or setgid scripts.) | |
3a52c276 | 216 | |
588f7210 SB |
217 | =head2 Taint mode and @INC |
218 | ||
219 | When the taint mode (C<-T>) is in effect, the "." directory is removed | |
220 | from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB> | |
221 | are ignored by Perl. You can still adjust C<@INC> from outside the | |
222 | program by using the C<-I> command line option as explained in | |
223 | L<perlrun>. The two environment variables are ignored because | |
224 | they are obscured, and a user running a program could be unaware that | |
225 | they are set, whereas the C<-I> option is clearly visible and | |
226 | therefore permitted. | |
227 | ||
228 | Another way to modify C<@INC> without modifying the program, is to use | |
229 | the C<lib> pragma, e.g.: | |
230 | ||
231 | perl -Mlib=/foo program | |
232 | ||
233 | The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former | |
234 | will automagically remove any duplicated directories, while the later | |
235 | will not. | |
236 | ||
6a268663 RGS |
237 | Note that if a tainted string is added to C<@INC>, the following |
238 | problem will be reported: | |
239 | ||
240 | Insecure dependency in require while running with -T switch | |
241 | ||
425e5e39 | 242 | =head2 Cleaning Up Your Path |
243 | ||
df98f984 RGS |
244 | For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to |
245 | a known value, and each directory in the path must be absolute and | |
246 | non-writable by others than its owner and group. You may be surprised to | |
247 | get this message even if the pathname to your executable is fully | |
248 | qualified. This is I<not> generated because you didn't supply a full path | |
249 | to the program; instead, it's generated because you never set your PATH | |
250 | environment variable, or you didn't set it to something that was safe. | |
251 | Because Perl can't guarantee that the executable in question isn't itself | |
252 | going to turn around and execute some other program that is dependent on | |
253 | your PATH, it makes sure you set the PATH. | |
a0d0e21e | 254 | |
a3cb178b GS |
255 | The PATH isn't the only environment variable which can cause problems. |
256 | Because some shells may use the variables IFS, CDPATH, ENV, and | |
257 | BASH_ENV, Perl checks that those are either empty or untainted when | |
258 | starting subprocesses. You may wish to add something like this to your | |
259 | setid and taint-checking scripts. | |
260 | ||
261 | delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer | |
262 | ||
a0d0e21e LW |
263 | It's also possible to get into trouble with other operations that don't |
264 | care whether they use tainted values. Make judicious use of the file | |
265 | tests in dealing with any user-supplied filenames. When possible, do | |
fb73857a | 266 | opens and such B<after> properly dropping any special user (or group!) |
267 | privileges. Perl doesn't prevent you from opening tainted filenames for reading, | |
a0d0e21e LW |
268 | so be careful what you print out. The tainting mechanism is intended to |
269 | prevent stupid mistakes, not to remove the need for thought. | |
270 | ||
23634c10 AL |
271 | Perl does not call the shell to expand wild cards when you pass C<system> |
272 | and C<exec> explicit parameter lists instead of strings with possible shell | |
273 | wildcards in them. Unfortunately, the C<open>, C<glob>, and | |
54310121 | 274 | backtick functions provide no such alternate calling convention, so more |
275 | subterfuge will be required. | |
425e5e39 | 276 | |
277 | Perl provides a reasonably safe way to open a file or pipe from a setuid | |
278 | or setgid program: just create a child process with reduced privilege who | |
279 | does the dirty work for you. First, fork a child using the special | |
23634c10 | 280 | C<open> syntax that connects the parent and child by a pipe. Now the |
425e5e39 | 281 | child resets its ID set and any other per-process attributes, like |
282 | environment variables, umasks, current working directories, back to the | |
283 | originals or known safe values. Then the child process, which no longer | |
23634c10 | 284 | has any special permissions, does the C<open> or other system call. |
425e5e39 | 285 | Finally, the child passes the data it managed to access back to the |
5f05dabc | 286 | parent. Because the file or pipe was opened in the child while running |
425e5e39 | 287 | under less privilege than the parent, it's not apt to be tricked into |
288 | doing something it shouldn't. | |
289 | ||
23634c10 | 290 | Here's a way to do backticks reasonably safely. Notice how the C<exec> is |
425e5e39 | 291 | not called with a string that the shell could expand. This is by far the |
292 | best way to call something that might be subjected to shell escapes: just | |
fb73857a | 293 | never call the shell at all. |
cb1a09d0 | 294 | |
a1ce9542 | 295 | use English '-no_match_vars'; |
e093bcf0 GW |
296 | die "Can't fork: $!" unless defined($pid = open(KID, "-|")); |
297 | if ($pid) { # parent | |
298 | while (<KID>) { | |
299 | # do something | |
300 | } | |
301 | close KID; | |
302 | } else { | |
303 | my @temp = ($EUID, $EGID); | |
304 | my $orig_uid = $UID; | |
305 | my $orig_gid = $GID; | |
306 | $EUID = $UID; | |
307 | $EGID = $GID; | |
308 | # Drop privileges | |
309 | $UID = $orig_uid; | |
310 | $GID = $orig_gid; | |
311 | # Make sure privs are really gone | |
312 | ($EUID, $EGID) = @temp; | |
313 | die "Can't drop privileges" | |
314 | unless $UID == $EUID && $GID eq $EGID; | |
315 | $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH. | |
316 | # Consider sanitizing the environment even more. | |
317 | exec 'myprog', 'arg1', 'arg2' | |
318 | or die "can't exec myprog: $!"; | |
319 | } | |
425e5e39 | 320 | |
fb73857a | 321 | A similar strategy would work for wildcard expansion via C<glob>, although |
322 | you can use C<readdir> instead. | |
425e5e39 | 323 | |
324 | Taint checking is most useful when although you trust yourself not to have | |
325 | written a program to give away the farm, you don't necessarily trust those | |
326 | who end up using it not to try to trick it into doing something bad. This | |
fb73857a | 327 | is the kind of security checking that's useful for set-id programs and |
425e5e39 | 328 | programs launched on someone else's behalf, like CGI programs. |
329 | ||
330 | This is quite different, however, from not even trusting the writer of the | |
331 | code not to try to do something evil. That's the kind of trust needed | |
332 | when someone hands you a program you've never seen before and says, "Here, | |
333 | run this." For that kind of safety, check out the Safe module, | |
334 | included standard in the Perl distribution. This module allows the | |
335 | programmer to set up special compartments in which all system operations | |
336 | are trapped and namespace access is carefully controlled. | |
337 | ||
338 | =head2 Security Bugs | |
339 | ||
340 | Beyond the obvious problems that stem from giving special privileges to | |
fb73857a | 341 | systems as flexible as scripts, on many versions of Unix, set-id scripts |
425e5e39 | 342 | are inherently insecure right from the start. The problem is a race |
343 | condition in the kernel. Between the time the kernel opens the file to | |
fb73857a | 344 | see which interpreter to run and when the (now-set-id) interpreter turns |
425e5e39 | 345 | around and reopens the file to interpret it, the file in question may have |
346 | changed, especially if you have symbolic links on your system. | |
347 | ||
348 | Fortunately, sometimes this kernel "feature" can be disabled. | |
349 | Unfortunately, there are two ways to disable it. The system can simply | |
fb73857a | 350 | outlaw scripts with any set-id bit set, which doesn't help much. |
351 | Alternately, it can simply ignore the set-id bits on scripts. If the | |
425e5e39 | 352 | latter is true, Perl can emulate the setuid and setgid mechanism when it |
353 | notices the otherwise useless setuid/gid bits on Perl scripts. It does | |
23634c10 | 354 | this via a special executable called F<suidperl> that is automatically |
54310121 | 355 | invoked for you if it's needed. |
425e5e39 | 356 | |
fb73857a | 357 | However, if the kernel set-id script feature isn't disabled, Perl will |
358 | complain loudly that your set-id script is insecure. You'll need to | |
359 | either disable the kernel set-id script feature, or put a C wrapper around | |
425e5e39 | 360 | the script. A C wrapper is just a compiled program that does nothing |
361 | except call your Perl program. Compiled programs are not subject to the | |
fb73857a | 362 | kernel bug that plagues set-id scripts. Here's a simple wrapper, written |
425e5e39 | 363 | in C: |
364 | ||
365 | #define REAL_PATH "/path/to/script" | |
54310121 | 366 | main(ac, av) |
425e5e39 | 367 | char **av; |
368 | { | |
369 | execv(REAL_PATH, av); | |
54310121 | 370 | } |
cb1a09d0 | 371 | |
54310121 | 372 | Compile this wrapper into a binary executable and then make I<it> rather |
373 | than your script setuid or setgid. | |
425e5e39 | 374 | |
425e5e39 | 375 | In recent years, vendors have begun to supply systems free of this |
376 | inherent security bug. On such systems, when the kernel passes the name | |
fb73857a | 377 | of the set-id script to open to the interpreter, rather than using a |
425e5e39 | 378 | pathname subject to meddling, it instead passes I</dev/fd/3>. This is a |
379 | special file already opened on the script, so that there can be no race | |
380 | condition for evil scripts to exploit. On these systems, Perl should be | |
23634c10 | 381 | compiled with C<-DSETUID_SCRIPTS_ARE_SECURE_NOW>. The F<Configure> |
425e5e39 | 382 | program that builds Perl tries to figure this out for itself, so you |
383 | should never have to specify this yourself. Most modern releases of | |
384 | SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition. | |
385 | ||
23634c10 | 386 | Prior to release 5.6.1 of Perl, bugs in the code of F<suidperl> could |
0325b4c4 | 387 | introduce a security hole. |
68dc0745 | 388 | |
389 | =head2 Protecting Your Programs | |
390 | ||
391 | There are a number of ways to hide the source to your Perl programs, | |
392 | with varying levels of "security". | |
393 | ||
394 | First of all, however, you I<can't> take away read permission, because | |
395 | the source code has to be readable in order to be compiled and | |
396 | interpreted. (That doesn't mean that a CGI script's source is | |
397 | readable by people on the web, though.) So you have to leave the | |
5a964f20 TC |
398 | permissions at the socially friendly 0755 level. This lets |
399 | people on your local system only see your source. | |
68dc0745 | 400 | |
5a964f20 | 401 | Some people mistakenly regard this as a security problem. If your program does |
68dc0745 | 402 | insecure things, and relies on people not knowing how to exploit those |
403 | insecurities, it is not secure. It is often possible for someone to | |
404 | determine the insecure things and exploit them without viewing the | |
405 | source. Security through obscurity, the name for hiding your bugs | |
406 | instead of fixing them, is little security indeed. | |
407 | ||
83df6a1d JH |
408 | You can try using encryption via source filters (Filter::* from CPAN, |
409 | or Filter::Util::Call and Filter::Simple since Perl 5.8). | |
410 | But crackers might be able to decrypt it. You can try using the byte | |
411 | code compiler and interpreter described below, but crackers might be | |
412 | able to de-compile it. You can try using the native-code compiler | |
68dc0745 | 413 | described below, but crackers might be able to disassemble it. These |
414 | pose varying degrees of difficulty to people wanting to get at your | |
415 | code, but none can definitively conceal it (this is true of every | |
416 | language, not just Perl). | |
417 | ||
418 | If you're concerned about people profiting from your code, then the | |
419 | bottom line is that nothing but a restrictive licence will give you | |
420 | legal security. License your software and pepper it with threatening | |
421 | statements like "This is unpublished proprietary software of XYZ Corp. | |
422 | Your access to it does not give you permission to use it blah blah | |
423 | blah." You should see a lawyer to be sure your licence's wording will | |
424 | stand up in court. | |
5a964f20 | 425 | |
0d7c09bb JH |
426 | =head2 Unicode |
427 | ||
428 | Unicode is a new and complex technology and one may easily overlook | |
429 | certain security pitfalls. See L<perluniintro> for an overview and | |
430 | L<perlunicode> for details, and L<perlunicode/"Security Implications | |
431 | of Unicode"> for security implications in particular. | |
432 | ||
504f80c1 JH |
433 | =head2 Algorithmic Complexity Attacks |
434 | ||
435 | Certain internal algorithms used in the implementation of Perl can | |
436 | be attacked by choosing the input carefully to consume large amounts | |
437 | of either time or space or both. This can lead into the so-called | |
438 | I<Denial of Service> (DoS) attacks. | |
439 | ||
440 | =over 4 | |
441 | ||
442 | =item * | |
443 | ||
444 | Hash Function - the algorithm used to "order" hash elements has been | |
445 | changed several times during the development of Perl, mainly to be | |
446 | reasonably fast. In Perl 5.8.1 also the security aspect was taken | |
447 | into account. | |
448 | ||
449 | In Perls before 5.8.1 one could rather easily generate data that as | |
450 | hash keys would cause Perl to consume large amounts of time because | |
4546b9e6 JH |
451 | internal structure of hashes would badly degenerate. In Perl 5.8.1 |
452 | the hash function is randomly perturbed by a pseudorandom seed which | |
453 | makes generating such naughty hash keys harder. | |
454 | See L<perlrun/PERL_HASH_SEED> for more information. | |
455 | ||
456 | The random perturbation is done by default but if one wants for some | |
457 | reason emulate the old behaviour one can set the environment variable | |
458 | PERL_HASH_SEED to zero (or any other integer). One possible reason | |
459 | for wanting to emulate the old behaviour is that in the new behaviour | |
460 | consecutive runs of Perl will order hash keys differently, which may | |
461 | confuse some applications (like Data::Dumper: the outputs of two | |
462 | different runs are no more identical). | |
504f80c1 | 463 | |
7b3f7037 JH |
464 | B<Perl has never guaranteed any ordering of the hash keys>, and the |
465 | ordering has already changed several times during the lifetime of | |
466 | Perl 5. Also, the ordering of hash keys has always been, and | |
467 | continues to be, affected by the insertion order. | |
468 | ||
469 | Also note that while the order of the hash elements might be | |
470 | randomised, this "pseudoordering" should B<not> be used for | |
471 | applications like shuffling a list randomly (use List::Util::shuffle() | |
472 | for that, see L<List::Util>, a standard core module since Perl 5.8.0; | |
473 | or the CPAN module Algorithm::Numerical::Shuffle), or for generating | |
474 | permutations (use e.g. the CPAN modules Algorithm::Permute or | |
475 | Algorithm::FastPermute), or for any cryptographic applications. | |
476 | ||
504f80c1 JH |
477 | =item * |
478 | ||
479 | Regular expressions - Perl's regular expression engine is so called | |
480 | NFA (Non-Finite Automaton), which among other things means that it can | |
481 | rather easily consume large amounts of both time and space if the | |
482 | regular expression may match in several ways. Careful crafting of the | |
483 | regular expressions can help but quite often there really isn't much | |
484 | one can do (the book "Mastering Regular Expressions" is required | |
485 | reading, see L<perlfaq2>). Running out of space manifests itself by | |
486 | Perl running out of memory. | |
487 | ||
488 | =item * | |
489 | ||
490 | Sorting - the quicksort algorithm used in Perls before 5.8.0 to | |
491 | implement the sort() function is very easy to trick into misbehaving | |
492 | so that it consumes a lot of time. Nothing more is required than | |
493 | resorting a list already sorted. Starting from Perl 5.8.0 a different | |
494 | sorting algorithm, mergesort, is used. Mergesort is insensitive to | |
495 | its input data, so it cannot be similarly fooled. | |
496 | ||
497 | =back | |
498 | ||
499 | See L<http://www.cs.rice.edu/~scrosby/hash/> for more information, | |
e1065f50 | 500 | and any computer science textbook on the algorithmic complexity. |
504f80c1 | 501 | |
5a964f20 TC |
502 | =head1 SEE ALSO |
503 | ||
504 | L<perlrun> for its description of cleaning up environment variables. |