more uni doc tweakage

[perl5.git] / pod / perlsec.pod
diff --git a/pod/perlsec.pod b/pod/perlsec.pod

index 2e1fda3..5a09e32 100644 (file)
--- a/pod/perlsec.pod
+++ b/pod/perlsec.pod
@@ -65,12 +65,14 @@ in which case they are able to run arbitrary external code.
  
  =back
  
-The value of an expression containing tainted data will itself be
-tainted, even if it is logically impossible for the tainted data to
-affect the value.
+For efficiency reasons, Perl takes a conservative view of
+whether data is tainted.  If an expression contains tainted data,
+any subexpression may be considered tainted, even if the value
+of the subexpression is not itself affected by the tainted data.
  
  Because taintedness is associated with each scalar value, some
-elements of an array can be tainted and others not.
+elements of an array or hash can be tainted and others not.
+The keys of a hash are never tainted.
  
  For example:
  
@@ -133,7 +135,7 @@ To test whether a variable contains tainted data, and whose use would
  thus trigger an "Insecure dependency" message, you can use the
  tainted() function of the Scalar::Util module, available in your
  nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
-Or you may be able to use the following I<is_tainted()> function.
+Or you may be able to use the following C<is_tainted()> function.
  
      sub is_tainted {
          return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 };
@@ -147,7 +149,8 @@ approach is used that if any tainted value has been accessed within the
  same expression, the whole expression is considered tainted.
  
  But testing for taintedness gets you only so far.  Sometimes you have just
-to clear your data's taintedness.  The only way to bypass the tainting
+to clear your data's taintedness.  Values may be untainted by using them
+as keys in a hash; otherwise the only way to bypass the tainting
  mechanism is by referencing subpatterns from a regular expression match.
  Perl presumes that if you reference a substring using $1, $2, etc., that
  you knew what you were doing when you wrote the pattern.  That means using
@@ -164,7 +167,7 @@ or a dot.
      if ($data =~ /^([-\@\w.]+)$/) {
         $data = $1;                     # $data now untainted
      } else {
-       die "Bad data in $data";        # log this somewhere
+       die "Bad data in '$data'";      # log this somewhere
      }
  
  This is fairly secure because C</\w+/> doesn't normally match shell
@@ -195,6 +198,26 @@ line, so you may need to use something like C<-wU> instead of C<-w -U>
  under such systems.  (This issue should arise only in Unix or
  Unix-like environments that support #! and setuid or setgid scripts.)
  
+=head2 Taint mode and @INC
+
+When the taint mode (C<-T>) is in effect, the "." directory is removed
+from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB>
+are ignored by Perl. You can still adjust C<@INC> from outside the
+program by using the C<-I> command line option as explained in
+L<perlrun>. The two environment variables are ignored because
+they are obscured, and a user running a program could be unaware that
+they are set, whereas the C<-I> option is clearly visible and
+therefore permitted.
+
+Another way to modify C<@INC> without modifying the program, is to use
+the C<lib> pragma, e.g.:
+
+  perl -Mlib=/foo program
+
+The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former
+will automagically remove any duplicated directories, while the later
+will not.
+
  =head2 Cleaning Up Your Path
  
  For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to a
@@ -386,6 +409,75 @@ certain security pitfalls.  See L<perluniintro> for an overview and
  L<perlunicode> for details, and L<perlunicode/"Security Implications
  of Unicode"> for security implications in particular.
  
+=head2 Algorithmic Complexity Attacks
+
+Certain internal algorithms used in the implementation of Perl can
+be attacked by choosing the input carefully to consume large amounts
+of either time or space or both.  This can lead into the so-called
+I<Denial of Service> (DoS) attacks.
+
+=over 4
+
+=item *
+
+Hash Function - the algorithm used to "order" hash elements has been
+changed several times during the development of Perl, mainly to be
+reasonably fast.  In Perl 5.8.1 also the security aspect was taken
+into account.
+
+In Perls before 5.8.1 one could rather easily generate data that as
+hash keys would cause Perl to consume large amounts of time because
+internal structure of hashes would badly degenerate.  In Perl 5.8.1
+the hash function is randomly perturbed by a pseudorandom seed which
+makes generating such naughty hash keys harder.
+See L<perlrun/PERL_HASH_SEED> for more information.
+
+The random perturbation is done by default but if one wants for some
+reason emulate the old behaviour one can set the environment variable
+PERL_HASH_SEED to zero (or any other integer).  One possible reason
+for wanting to emulate the old behaviour is that in the new behaviour
+consecutive runs of Perl will order hash keys differently, which may
+confuse some applications (like Data::Dumper: the outputs of two
+different runs are no more identical).
+
+B<Perl has never guaranteed any ordering of the hash keys>, and the
+ordering has already changed several times during the lifetime of
+Perl 5.  Also, the ordering of hash keys has always been, and
+continues to be, affected by the insertion order.
+
+Also note that while the order of the hash elements might be
+randomised, this "pseudoordering" should B<not> be used for
+applications like shuffling a list randomly (use List::Util::shuffle()
+for that, see L<List::Util>, a standard core module since Perl 5.8.0;
+or the CPAN module Algorithm::Numerical::Shuffle), or for generating
+permutations (use e.g. the CPAN modules Algorithm::Permute or
+Algorithm::FastPermute), or for any cryptographic applications.
+
+=item *
+
+Regular expressions - Perl's regular expression engine is so called
+NFA (Non-Finite Automaton), which among other things means that it can
+rather easily consume large amounts of both time and space if the
+regular expression may match in several ways.  Careful crafting of the
+regular expressions can help but quite often there really isn't much
+one can do (the book "Mastering Regular Expressions" is required
+reading, see L<perlfaq2>).  Running out of space manifests itself by
+Perl running out of memory.
+
+=item *
+
+Sorting - the quicksort algorithm used in Perls before 5.8.0 to
+implement the sort() function is very easy to trick into misbehaving
+so that it consumes a lot of time.  Nothing more is required than
+resorting a list already sorted.  Starting from Perl 5.8.0 a different
+sorting algorithm, mergesort, is used.  Mergesort is insensitive to
+its input data, so it cannot be similarly fooled.
+
+=back
+
+See L<http://www.cs.rice.edu/~scrosby/hash/> for more information,
+and any computer science text book on the algorithmic complexity.
+
  =head1 SEE ALSO
  
  L<perlrun> for its description of cleaning up environment variables.