| 1 | =head1 NAME |
| 2 | |
| 3 | perlsec - Perl security |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | Perl is designed to make it easy to program securely even when running |
| 8 | with extra privileges, like setuid or setgid programs. Unlike most |
| 9 | command line shells, which are based on multiple substitution passes on |
| 10 | each line of the script, Perl uses a more conventional evaluation scheme |
| 11 | with fewer hidden snags. Additionally, because the language has more |
| 12 | builtin functionality, it can rely less upon external (and possibly |
| 13 | untrustworthy) programs to accomplish its purposes. |
| 14 | |
| 15 | =head1 SECURITY VULNERABILITY CONTACT INFORMATION |
| 16 | |
| 17 | If you believe you have found a security vulnerability in Perl, please email |
| 18 | perl5-security-report@perl.org with details. This points to a closed |
| 19 | subscription, unarchived mailing list. Please only use this address for |
| 20 | security issues in the Perl core, not for modules independently distributed on |
| 21 | CPAN. |
| 22 | |
| 23 | =head1 SECURITY MECHANISMS AND CONCERNS |
| 24 | |
| 25 | =head2 Taint mode |
| 26 | |
| 27 | Perl automatically enables a set of special security checks, called I<taint |
| 28 | mode>, when it detects its program running with differing real and effective |
| 29 | user or group IDs. The setuid bit in Unix permissions is mode 04000, the |
| 30 | setgid bit mode 02000; either or both may be set. You can also enable taint |
| 31 | mode explicitly by using the B<-T> command line flag. This flag is |
| 32 | I<strongly> suggested for server programs and any program run on behalf of |
| 33 | someone else, such as a CGI script. Once taint mode is on, it's on for |
| 34 | the remainder of your script. |
| 35 | |
| 36 | While in this mode, Perl takes special precautions called I<taint |
| 37 | checks> to prevent both obvious and subtle traps. Some of these checks |
| 38 | are reasonably simple, such as verifying that path directories aren't |
| 39 | writable by others; careful programmers have always used checks like |
| 40 | these. Other checks, however, are best supported by the language itself, |
| 41 | and it is these checks especially that contribute to making a set-id Perl |
| 42 | program more secure than the corresponding C program. |
| 43 | |
| 44 | You may not use data derived from outside your program to affect |
| 45 | something else outside your program--at least, not by accident. All |
| 46 | command line arguments, environment variables, locale information (see |
| 47 | L<perllocale>), results of certain system calls (C<readdir()>, |
| 48 | C<readlink()>, the variable of C<shmread()>, the messages returned by |
| 49 | C<msgrcv()>, the password, gcos and shell fields returned by the |
| 50 | C<getpwxxx()> calls), and all file input are marked as "tainted". |
| 51 | Tainted data may not be used directly or indirectly in any command |
| 52 | that invokes a sub-shell, nor in any command that modifies files, |
| 53 | directories, or processes, B<with the following exceptions>: |
| 54 | |
| 55 | =over 4 |
| 56 | |
| 57 | =item * |
| 58 | |
| 59 | Arguments to C<print> and C<syswrite> are B<not> checked for taintedness. |
| 60 | |
| 61 | =item * |
| 62 | |
| 63 | Symbolic methods |
| 64 | |
| 65 | $obj->$method(@args); |
| 66 | |
| 67 | and symbolic sub references |
| 68 | |
| 69 | &{$foo}(@args); |
| 70 | $foo->(@args); |
| 71 | |
| 72 | are not checked for taintedness. This requires extra carefulness |
| 73 | unless you want external data to affect your control flow. Unless |
| 74 | you carefully limit what these symbolic values are, people are able |
| 75 | to call functions B<outside> your Perl code, such as POSIX::system, |
| 76 | in which case they are able to run arbitrary external code. |
| 77 | |
| 78 | =item * |
| 79 | |
| 80 | Hash keys are B<never> tainted. |
| 81 | |
| 82 | =back |
| 83 | |
| 84 | For efficiency reasons, Perl takes a conservative view of |
| 85 | whether data is tainted. If an expression contains tainted data, |
| 86 | any subexpression may be considered tainted, even if the value |
| 87 | of the subexpression is not itself affected by the tainted data. |
| 88 | |
| 89 | Because taintedness is associated with each scalar value, some |
| 90 | elements of an array or hash can be tainted and others not. |
| 91 | The keys of a hash are B<never> tainted. |
| 92 | |
| 93 | For example: |
| 94 | |
| 95 | $arg = shift; # $arg is tainted |
| 96 | $hid = $arg . 'bar'; # $hid is also tainted |
| 97 | $line = <>; # Tainted |
| 98 | $line = <STDIN>; # Also tainted |
| 99 | open FOO, "/home/me/bar" or die $!; |
| 100 | $line = <FOO>; # Still tainted |
| 101 | $path = $ENV{'PATH'}; # Tainted, but see below |
| 102 | $data = 'abc'; # Not tainted |
| 103 | |
| 104 | system "echo $arg"; # Insecure |
| 105 | system "/bin/echo", $arg; # Considered insecure |
| 106 | # (Perl doesn't know about /bin/echo) |
| 107 | system "echo $hid"; # Insecure |
| 108 | system "echo $data"; # Insecure until PATH set |
| 109 | |
| 110 | $path = $ENV{'PATH'}; # $path now tainted |
| 111 | |
| 112 | $ENV{'PATH'} = '/bin:/usr/bin'; |
| 113 | delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'}; |
| 114 | |
| 115 | $path = $ENV{'PATH'}; # $path now NOT tainted |
| 116 | system "echo $data"; # Is secure now! |
| 117 | |
| 118 | open(FOO, "< $arg"); # OK - read-only file |
| 119 | open(FOO, "> $arg"); # Not OK - trying to write |
| 120 | |
| 121 | open(FOO,"echo $arg|"); # Not OK |
| 122 | open(FOO,"-|") |
| 123 | or exec 'echo', $arg; # Also not OK |
| 124 | |
| 125 | $shout = `echo $arg`; # Insecure, $shout now tainted |
| 126 | |
| 127 | unlink $data, $arg; # Insecure |
| 128 | umask $arg; # Insecure |
| 129 | |
| 130 | exec "echo $arg"; # Insecure |
| 131 | exec "echo", $arg; # Insecure |
| 132 | exec "sh", '-c', $arg; # Very insecure! |
| 133 | |
| 134 | @files = <*.c>; # insecure (uses readdir() or similar) |
| 135 | @files = glob('*.c'); # insecure (uses readdir() or similar) |
| 136 | |
| 137 | # In either case, the results of glob are tainted, since the list of |
| 138 | # filenames comes from outside of the program. |
| 139 | |
| 140 | $bad = ($arg, 23); # $bad will be tainted |
| 141 | $arg, `true`; # Insecure (although it isn't really) |
| 142 | |
| 143 | If you try to do something insecure, you will get a fatal error saying |
| 144 | something like "Insecure dependency" or "Insecure $ENV{PATH}". |
| 145 | |
| 146 | The exception to the principle of "one tainted value taints the whole |
| 147 | expression" is with the ternary conditional operator C<?:>. Since code |
| 148 | with a ternary conditional |
| 149 | |
| 150 | $result = $tainted_value ? "Untainted" : "Also untainted"; |
| 151 | |
| 152 | is effectively |
| 153 | |
| 154 | if ( $tainted_value ) { |
| 155 | $result = "Untainted"; |
| 156 | } else { |
| 157 | $result = "Also untainted"; |
| 158 | } |
| 159 | |
| 160 | it doesn't make sense for C<$result> to be tainted. |
| 161 | |
| 162 | =head2 Laundering and Detecting Tainted Data |
| 163 | |
| 164 | To test whether a variable contains tainted data, and whose use would |
| 165 | thus trigger an "Insecure dependency" message, you can use the |
| 166 | C<tainted()> function of the Scalar::Util module, available in your |
| 167 | nearby CPAN mirror, and included in Perl starting from the release 5.8.0. |
| 168 | Or you may be able to use the following C<is_tainted()> function. |
| 169 | |
| 170 | sub is_tainted { |
| 171 | local $@; # Don't pollute caller's value. |
| 172 | return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 }; |
| 173 | } |
| 174 | |
| 175 | This function makes use of the fact that the presence of tainted data |
| 176 | anywhere within an expression renders the entire expression tainted. It |
| 177 | would be inefficient for every operator to test every argument for |
| 178 | taintedness. Instead, the slightly more efficient and conservative |
| 179 | approach is used that if any tainted value has been accessed within the |
| 180 | same expression, the whole expression is considered tainted. |
| 181 | |
| 182 | But testing for taintedness gets you only so far. Sometimes you have just |
| 183 | to clear your data's taintedness. Values may be untainted by using them |
| 184 | as keys in a hash; otherwise the only way to bypass the tainting |
| 185 | mechanism is by referencing subpatterns from a regular expression match. |
| 186 | Perl presumes that if you reference a substring using $1, $2, etc., that |
| 187 | you knew what you were doing when you wrote the pattern. That means using |
| 188 | a bit of thought--don't just blindly untaint anything, or you defeat the |
| 189 | entire mechanism. It's better to verify that the variable has only good |
| 190 | characters (for certain values of "good") rather than checking whether it |
| 191 | has any bad characters. That's because it's far too easy to miss bad |
| 192 | characters that you never thought of. |
| 193 | |
| 194 | Here's a test to make sure that the data contains nothing but "word" |
| 195 | characters (alphabetics, numerics, and underscores), a hyphen, an at sign, |
| 196 | or a dot. |
| 197 | |
| 198 | if ($data =~ /^([-\@\w.]+)$/) { |
| 199 | $data = $1; # $data now untainted |
| 200 | } else { |
| 201 | die "Bad data in '$data'"; # log this somewhere |
| 202 | } |
| 203 | |
| 204 | This is fairly secure because C</\w+/> doesn't normally match shell |
| 205 | metacharacters, nor are dot, dash, or at going to mean something special |
| 206 | to the shell. Use of C</.+/> would have been insecure in theory because |
| 207 | it lets everything through, but Perl doesn't check for that. The lesson |
| 208 | is that when untainting, you must be exceedingly careful with your patterns. |
| 209 | Laundering data using regular expression is the I<only> mechanism for |
| 210 | untainting dirty data, unless you use the strategy detailed below to fork |
| 211 | a child of lesser privilege. |
| 212 | |
| 213 | The example does not untaint C<$data> if C<use locale> is in effect, |
| 214 | because the characters matched by C<\w> are determined by the locale. |
| 215 | Perl considers that locale definitions are untrustworthy because they |
| 216 | contain data from outside the program. If you are writing a |
| 217 | locale-aware program, and want to launder data with a regular expression |
| 218 | containing C<\w>, put C<no locale> ahead of the expression in the same |
| 219 | block. See L<perllocale/SECURITY> for further discussion and examples. |
| 220 | |
| 221 | =head2 Switches On the "#!" Line |
| 222 | |
| 223 | When you make a script executable, in order to make it usable as a |
| 224 | command, the system will pass switches to perl from the script's #! |
| 225 | line. Perl checks that any command line switches given to a setuid |
| 226 | (or setgid) script actually match the ones set on the #! line. Some |
| 227 | Unix and Unix-like environments impose a one-switch limit on the #! |
| 228 | line, so you may need to use something like C<-wU> instead of C<-w -U> |
| 229 | under such systems. (This issue should arise only in Unix or |
| 230 | Unix-like environments that support #! and setuid or setgid scripts.) |
| 231 | |
| 232 | =head2 Taint mode and @INC |
| 233 | |
| 234 | When the taint mode (C<-T>) is in effect, the "." directory is removed |
| 235 | from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB> |
| 236 | are ignored by Perl. You can still adjust C<@INC> from outside the |
| 237 | program by using the C<-I> command line option as explained in |
| 238 | L<perlrun>. The two environment variables are ignored because |
| 239 | they are obscured, and a user running a program could be unaware that |
| 240 | they are set, whereas the C<-I> option is clearly visible and |
| 241 | therefore permitted. |
| 242 | |
| 243 | Another way to modify C<@INC> without modifying the program, is to use |
| 244 | the C<lib> pragma, e.g.: |
| 245 | |
| 246 | perl -Mlib=/foo program |
| 247 | |
| 248 | The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former |
| 249 | will automagically remove any duplicated directories, while the later |
| 250 | will not. |
| 251 | |
| 252 | Note that if a tainted string is added to C<@INC>, the following |
| 253 | problem will be reported: |
| 254 | |
| 255 | Insecure dependency in require while running with -T switch |
| 256 | |
| 257 | =head2 Cleaning Up Your Path |
| 258 | |
| 259 | For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to |
| 260 | a known value, and each directory in the path must be absolute and |
| 261 | non-writable by others than its owner and group. You may be surprised to |
| 262 | get this message even if the pathname to your executable is fully |
| 263 | qualified. This is I<not> generated because you didn't supply a full path |
| 264 | to the program; instead, it's generated because you never set your PATH |
| 265 | environment variable, or you didn't set it to something that was safe. |
| 266 | Because Perl can't guarantee that the executable in question isn't itself |
| 267 | going to turn around and execute some other program that is dependent on |
| 268 | your PATH, it makes sure you set the PATH. |
| 269 | |
| 270 | The PATH isn't the only environment variable which can cause problems. |
| 271 | Because some shells may use the variables IFS, CDPATH, ENV, and |
| 272 | BASH_ENV, Perl checks that those are either empty or untainted when |
| 273 | starting subprocesses. You may wish to add something like this to your |
| 274 | setid and taint-checking scripts. |
| 275 | |
| 276 | delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer |
| 277 | |
| 278 | It's also possible to get into trouble with other operations that don't |
| 279 | care whether they use tainted values. Make judicious use of the file |
| 280 | tests in dealing with any user-supplied filenames. When possible, do |
| 281 | opens and such B<after> properly dropping any special user (or group!) |
| 282 | privileges. Perl doesn't prevent you from opening tainted filenames for reading, |
| 283 | so be careful what you print out. The tainting mechanism is intended to |
| 284 | prevent stupid mistakes, not to remove the need for thought. |
| 285 | |
| 286 | Perl does not call the shell to expand wild cards when you pass C<system> |
| 287 | and C<exec> explicit parameter lists instead of strings with possible shell |
| 288 | wildcards in them. Unfortunately, the C<open>, C<glob>, and |
| 289 | backtick functions provide no such alternate calling convention, so more |
| 290 | subterfuge will be required. |
| 291 | |
| 292 | Perl provides a reasonably safe way to open a file or pipe from a setuid |
| 293 | or setgid program: just create a child process with reduced privilege who |
| 294 | does the dirty work for you. First, fork a child using the special |
| 295 | C<open> syntax that connects the parent and child by a pipe. Now the |
| 296 | child resets its ID set and any other per-process attributes, like |
| 297 | environment variables, umasks, current working directories, back to the |
| 298 | originals or known safe values. Then the child process, which no longer |
| 299 | has any special permissions, does the C<open> or other system call. |
| 300 | Finally, the child passes the data it managed to access back to the |
| 301 | parent. Because the file or pipe was opened in the child while running |
| 302 | under less privilege than the parent, it's not apt to be tricked into |
| 303 | doing something it shouldn't. |
| 304 | |
| 305 | Here's a way to do backticks reasonably safely. Notice how the C<exec> is |
| 306 | not called with a string that the shell could expand. This is by far the |
| 307 | best way to call something that might be subjected to shell escapes: just |
| 308 | never call the shell at all. |
| 309 | |
| 310 | use English; |
| 311 | die "Can't fork: $!" unless defined($pid = open(KID, "-|")); |
| 312 | if ($pid) { # parent |
| 313 | while (<KID>) { |
| 314 | # do something |
| 315 | } |
| 316 | close KID; |
| 317 | } else { |
| 318 | my @temp = ($EUID, $EGID); |
| 319 | my $orig_uid = $UID; |
| 320 | my $orig_gid = $GID; |
| 321 | $EUID = $UID; |
| 322 | $EGID = $GID; |
| 323 | # Drop privileges |
| 324 | $UID = $orig_uid; |
| 325 | $GID = $orig_gid; |
| 326 | # Make sure privs are really gone |
| 327 | ($EUID, $EGID) = @temp; |
| 328 | die "Can't drop privileges" |
| 329 | unless $UID == $EUID && $GID eq $EGID; |
| 330 | $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH. |
| 331 | # Consider sanitizing the environment even more. |
| 332 | exec 'myprog', 'arg1', 'arg2' |
| 333 | or die "can't exec myprog: $!"; |
| 334 | } |
| 335 | |
| 336 | A similar strategy would work for wildcard expansion via C<glob>, although |
| 337 | you can use C<readdir> instead. |
| 338 | |
| 339 | Taint checking is most useful when although you trust yourself not to have |
| 340 | written a program to give away the farm, you don't necessarily trust those |
| 341 | who end up using it not to try to trick it into doing something bad. This |
| 342 | is the kind of security checking that's useful for set-id programs and |
| 343 | programs launched on someone else's behalf, like CGI programs. |
| 344 | |
| 345 | This is quite different, however, from not even trusting the writer of the |
| 346 | code not to try to do something evil. That's the kind of trust needed |
| 347 | when someone hands you a program you've never seen before and says, "Here, |
| 348 | run this." For that kind of safety, you might want to check out the Safe |
| 349 | module, included standard in the Perl distribution. This module allows the |
| 350 | programmer to set up special compartments in which all system operations |
| 351 | are trapped and namespace access is carefully controlled. Safe should |
| 352 | not be considered bullet-proof, though: it will not prevent the foreign |
| 353 | code to set up infinite loops, allocate gigabytes of memory, or even |
| 354 | abusing perl bugs to make the host interpreter crash or behave in |
| 355 | unpredictable ways. In any case it's better avoided completely if you're |
| 356 | really concerned about security. |
| 357 | |
| 358 | =head2 Security Bugs |
| 359 | |
| 360 | Beyond the obvious problems that stem from giving special privileges to |
| 361 | systems as flexible as scripts, on many versions of Unix, set-id scripts |
| 362 | are inherently insecure right from the start. The problem is a race |
| 363 | condition in the kernel. Between the time the kernel opens the file to |
| 364 | see which interpreter to run and when the (now-set-id) interpreter turns |
| 365 | around and reopens the file to interpret it, the file in question may have |
| 366 | changed, especially if you have symbolic links on your system. |
| 367 | |
| 368 | Fortunately, sometimes this kernel "feature" can be disabled. |
| 369 | Unfortunately, there are two ways to disable it. The system can simply |
| 370 | outlaw scripts with any set-id bit set, which doesn't help much. |
| 371 | Alternately, it can simply ignore the set-id bits on scripts. |
| 372 | |
| 373 | However, if the kernel set-id script feature isn't disabled, Perl will |
| 374 | complain loudly that your set-id script is insecure. You'll need to |
| 375 | either disable the kernel set-id script feature, or put a C wrapper around |
| 376 | the script. A C wrapper is just a compiled program that does nothing |
| 377 | except call your Perl program. Compiled programs are not subject to the |
| 378 | kernel bug that plagues set-id scripts. Here's a simple wrapper, written |
| 379 | in C: |
| 380 | |
| 381 | #define REAL_PATH "/path/to/script" |
| 382 | main(ac, av) |
| 383 | char **av; |
| 384 | { |
| 385 | execv(REAL_PATH, av); |
| 386 | } |
| 387 | |
| 388 | Compile this wrapper into a binary executable and then make I<it> rather |
| 389 | than your script setuid or setgid. |
| 390 | |
| 391 | In recent years, vendors have begun to supply systems free of this |
| 392 | inherent security bug. On such systems, when the kernel passes the name |
| 393 | of the set-id script to open to the interpreter, rather than using a |
| 394 | pathname subject to meddling, it instead passes I</dev/fd/3>. This is a |
| 395 | special file already opened on the script, so that there can be no race |
| 396 | condition for evil scripts to exploit. On these systems, Perl should be |
| 397 | compiled with C<-DSETUID_SCRIPTS_ARE_SECURE_NOW>. The F<Configure> |
| 398 | program that builds Perl tries to figure this out for itself, so you |
| 399 | should never have to specify this yourself. Most modern releases of |
| 400 | SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition. |
| 401 | |
| 402 | =head2 Protecting Your Programs |
| 403 | |
| 404 | There are a number of ways to hide the source to your Perl programs, |
| 405 | with varying levels of "security". |
| 406 | |
| 407 | First of all, however, you I<can't> take away read permission, because |
| 408 | the source code has to be readable in order to be compiled and |
| 409 | interpreted. (That doesn't mean that a CGI script's source is |
| 410 | readable by people on the web, though.) So you have to leave the |
| 411 | permissions at the socially friendly 0755 level. This lets |
| 412 | people on your local system only see your source. |
| 413 | |
| 414 | Some people mistakenly regard this as a security problem. If your program does |
| 415 | insecure things, and relies on people not knowing how to exploit those |
| 416 | insecurities, it is not secure. It is often possible for someone to |
| 417 | determine the insecure things and exploit them without viewing the |
| 418 | source. Security through obscurity, the name for hiding your bugs |
| 419 | instead of fixing them, is little security indeed. |
| 420 | |
| 421 | You can try using encryption via source filters (Filter::* from CPAN, |
| 422 | or Filter::Util::Call and Filter::Simple since Perl 5.8). |
| 423 | But crackers might be able to decrypt it. You can try using the byte |
| 424 | code compiler and interpreter described below, but crackers might be |
| 425 | able to de-compile it. You can try using the native-code compiler |
| 426 | described below, but crackers might be able to disassemble it. These |
| 427 | pose varying degrees of difficulty to people wanting to get at your |
| 428 | code, but none can definitively conceal it (this is true of every |
| 429 | language, not just Perl). |
| 430 | |
| 431 | If you're concerned about people profiting from your code, then the |
| 432 | bottom line is that nothing but a restrictive license will give you |
| 433 | legal security. License your software and pepper it with threatening |
| 434 | statements like "This is unpublished proprietary software of XYZ Corp. |
| 435 | Your access to it does not give you permission to use it blah blah |
| 436 | blah." You should see a lawyer to be sure your license's wording will |
| 437 | stand up in court. |
| 438 | |
| 439 | =head2 Unicode |
| 440 | |
| 441 | Unicode is a new and complex technology and one may easily overlook |
| 442 | certain security pitfalls. See L<perluniintro> for an overview and |
| 443 | L<perlunicode> for details, and L<perlunicode/"Security Implications |
| 444 | of Unicode"> for security implications in particular. |
| 445 | |
| 446 | =head2 Algorithmic Complexity Attacks |
| 447 | |
| 448 | Certain internal algorithms used in the implementation of Perl can |
| 449 | be attacked by choosing the input carefully to consume large amounts |
| 450 | of either time or space or both. This can lead into the so-called |
| 451 | I<Denial of Service> (DoS) attacks. |
| 452 | |
| 453 | =over 4 |
| 454 | |
| 455 | =item * |
| 456 | |
| 457 | Hash Algorithm - Hash algorithms like the one used in Perl are well |
| 458 | known to be vulnerable to collision attacks on their hash function. |
| 459 | Such attacks involve constructing a set of keys which collide into |
| 460 | the same bucket producing inefficient behavior. Such attacks often |
| 461 | depend on discovering the seed of the hash function used to map the |
| 462 | keys to buckets. That seed is then used to brute-force a key set which |
| 463 | can be used to mount a denial of service attack. In Perl 5.8.1 changes |
| 464 | were introduced to harden Perl to such attacks, and then later in |
| 465 | Perl 5.18.0 these features were enhanced and additional protections |
| 466 | added. |
| 467 | |
| 468 | At the time of this writing, Perl 5.18.0 is considered to be |
| 469 | well-hardened against algorithmic complexity attacks on its hash |
| 470 | implementation. This is largely owed to the following measures |
| 471 | mitigate attacks: |
| 472 | |
| 473 | =over 4 |
| 474 | |
| 475 | =item Hash Seed Randomization |
| 476 | |
| 477 | In order to make it impossible to know what seed to generate an attack |
| 478 | key set for, this seed is randomly initialized at process start. This |
| 479 | may be overridden by using the PERL_HASH_SEED environment variable, see |
| 480 | L<perlrun/PERL_HASH_SEED>. This environment variable controls how |
| 481 | items are actually stored, not how they are presented via |
| 482 | C<keys>, C<values> and C<each>. |
| 483 | |
| 484 | =item Hash Traversal Randomization |
| 485 | |
| 486 | Independent of which seed is used in the hash function, C<keys>, |
| 487 | C<values>, and C<each> return items in a per-hash randomized order. |
| 488 | Modifying a hash by insertion will change the iteration order of that hash. |
| 489 | This behavior can be overridden by using C<hash_traversal_mask()> from |
| 490 | L<Hash::Util> or by using the PERL_PERTURB_KEYS environment variable, |
| 491 | see L<perlrun/PERL_PERTURB_KEYS>. Note that this feature controls the |
| 492 | "visible" order of the keys, and not the actual order they are stored in. |
| 493 | |
| 494 | =item Bucket Order Perturbance |
| 495 | |
| 496 | When items collide into a given hash bucket the order they are stored in |
| 497 | the chain is no longer predictable in Perl 5.18. This has the intention |
| 498 | to make it harder to observe a collisions. This behavior can be overridden by using |
| 499 | the PERL_PERTURB_KEYS environment variable, see L<perlrun/PERL_PERTURB_KEYS>. |
| 500 | |
| 501 | =item New Default Hash Function |
| 502 | |
| 503 | The default hash function has been modified with the intention of making |
| 504 | it harder to infer the hash seed. |
| 505 | |
| 506 | =item Alternative Hash Functions |
| 507 | |
| 508 | The source code includes multiple hash algorithms to choose from. While we |
| 509 | believe that the default perl hash is robust to attack, we have included the |
| 510 | hash function Siphash as a fall-back option. At the time of release of |
| 511 | Perl 5.18.0 Siphash is believed to be of cryptographic strength. This is |
| 512 | not the default as it is much slower than the default hash. |
| 513 | |
| 514 | =back |
| 515 | |
| 516 | Without compiling a special Perl, there is no way to get the exact same |
| 517 | behavior of any versions prior to Perl 5.18.0. The closest one can get |
| 518 | is by setting PERL_PERTURB_KEYS to 0 and setting the PERL_HASH_SEED |
| 519 | to a known value. We do not advise those settings for production use |
| 520 | due to the above security considerations. |
| 521 | |
| 522 | B<Perl has never guaranteed any ordering of the hash keys>, and |
| 523 | the ordering has already changed several times during the lifetime of |
| 524 | Perl 5. Also, the ordering of hash keys has always been, and continues |
| 525 | to be, affected by the insertion order and the history of changes made |
| 526 | to the hash over its lifetime. |
| 527 | |
| 528 | Also note that while the order of the hash elements might be |
| 529 | randomized, this "pseudo-ordering" should B<not> be used for |
| 530 | applications like shuffling a list randomly (use C<List::Util::shuffle()> |
| 531 | for that, see L<List::Util>, a standard core module since Perl 5.8.0; |
| 532 | or the CPAN module C<Algorithm::Numerical::Shuffle>), or for generating |
| 533 | permutations (use e.g. the CPAN modules C<Algorithm::Permute> or |
| 534 | C<Algorithm::FastPermute>), or for any cryptographic applications. |
| 535 | |
| 536 | =item * |
| 537 | |
| 538 | Regular expressions - Perl's regular expression engine is so called NFA |
| 539 | (Non-deterministic Finite Automaton), which among other things means that |
| 540 | it can rather easily consume large amounts of both time and space if the |
| 541 | regular expression may match in several ways. Careful crafting of the |
| 542 | regular expressions can help but quite often there really isn't much |
| 543 | one can do (the book "Mastering Regular Expressions" is required |
| 544 | reading, see L<perlfaq2>). Running out of space manifests itself by |
| 545 | Perl running out of memory. |
| 546 | |
| 547 | =item * |
| 548 | |
| 549 | Sorting - the quicksort algorithm used in Perls before 5.8.0 to |
| 550 | implement the sort() function is very easy to trick into misbehaving |
| 551 | so that it consumes a lot of time. Starting from Perl 5.8.0 a different |
| 552 | sorting algorithm, mergesort, is used by default. Mergesort cannot |
| 553 | misbehave on any input. |
| 554 | |
| 555 | =back |
| 556 | |
| 557 | See L<http://www.cs.rice.edu/~scrosby/hash/> for more information, |
| 558 | and any computer science textbook on algorithmic complexity. |
| 559 | |
| 560 | =head1 SEE ALSO |
| 561 | |
| 562 | L<perlrun> for its description of cleaning up environment variables. |