more uni doc tweakage

[perl5.git] / pod / perlsec.pod
diff --git a/pod/perlsec.pod b/pod/perlsec.pod

index 4185e84..5a09e32 100644 (file)
--- a/pod/perlsec.pod
+++ b/pod/perlsec.pod
@@ -38,13 +38,41 @@ msgrcv(), the password, gcos and shell fields returned by the
  getpwxxx() calls), and all file input are marked as "tainted".
  Tainted data may not be used directly or indirectly in any command
  that invokes a sub-shell, nor in any command that modifies files,
-directories, or processes. (B<Important exception>: If you pass a list
-of arguments to either C<system> or C<exec>, the elements of that list
-are B<NOT> checked for taintedness.) Any variable set to a value
-derived from tainted data will itself be tainted, even if it is
-logically impossible for the tainted data to alter the variable.
+directories, or processes, B<with the following exceptions>:
+
+=over 4
+
+=item *
+
+Arguments to C<print> and C<syswrite> are B<not> checked for taintedness.
+
+=item *
+
+Symbolic methods
+
+    $obj->$method(@args);
+
+and symbolic sub references
+
+    &{$foo}(@args);
+    $foo->(@args);
+
+are not checked for taintedness.  This requires extra carefulness
+unless you want external data to affect your control flow.  Unless
+you carefully limit what these symbolic values are, people are able
+to call functions B<outside> your Perl code, such as POSIX::system,
+in which case they are able to run arbitrary external code.
+
+=back
+
+For efficiency reasons, Perl takes a conservative view of
+whether data is tainted.  If an expression contains tainted data,
+any subexpression may be considered tainted, even if the value
+of the subexpression is not itself affected by the tainted data.
+
  Because taintedness is associated with each scalar value, some
-elements of an array can be tainted and others not.
+elements of an array or hash can be tainted and others not.
+The keys of a hash are never tainted.
  
  For example:
  
@@ -58,7 +86,8 @@ For example:
      $data = 'abc';             # Not tainted
  
      system "echo $arg";                # Insecure
-    system "/bin/echo", $arg;  # Secure (doesn't use sh)
+    system "/bin/echo", $arg;  # Considered insecure
+                               # (Perl doesn't know about /bin/echo)
      system "echo $hid";                # Insecure
      system "echo $data";       # Insecure until PATH set
  
@@ -73,9 +102,9 @@ For example:
      open(FOO, "< $arg");       # OK - read-only file
      open(FOO, "> $arg");       # Not OK - trying to write
  
-    open(FOO,"echo $arg|");    # Not OK, but...
+    open(FOO,"echo $arg|");    # Not OK
      open(FOO,"-|")
-       or exec 'echo', $arg;   # OK
+       or exec 'echo', $arg;   # Also not OK
  
      $shout = `echo $arg`;      # Insecure, $shout now tainted
  
@@ -83,29 +112,33 @@ For example:
      umask $arg;                        # Insecure
  
      exec "echo $arg";          # Insecure
-    exec "echo", $arg;         # Secure (doesn't use the shell)
-    exec "sh", '-c', $arg;     # Considered secure, alas!
+    exec "echo", $arg;         # Insecure
+    exec "sh", '-c', $arg;     # Very insecure!
  
      @files = <*.c>;            # insecure (uses readdir() or similar)
      @files = glob('*.c');      # insecure (uses readdir() or similar)
  
+    # In Perl releases older than 5.6.0 the <*.c> and glob('*.c') would
+    # have used an external program to do the filename expansion; but in
+    # either case the result is tainted since the list of filenames comes
+    # from outside of the program.
+
+    $bad = ($arg, 23);         # $bad will be tainted
+    $arg, `true`;              # Insecure (although it isn't really)
+
  If you try to do something insecure, you will get a fatal error saying
-something like "Insecure dependency" or "Insecure $ENV{PATH}".  Note that you
-can still write an insecure B<system> or B<exec>, but only by explicitly
-doing something like the "considered secure" example above.
+something like "Insecure dependency" or "Insecure $ENV{PATH}".
  
  =head2 Laundering and Detecting Tainted Data
  
-To test whether a variable contains tainted data, and whose use would thus
-trigger an "Insecure dependency" message, check your nearby CPAN mirror
-for the F<Taint.pm> module, which should become available around November
-1997.  Or you may be able to use the following I<is_tainted()> function.
+To test whether a variable contains tainted data, and whose use would
+thus trigger an "Insecure dependency" message, you can use the
+tainted() function of the Scalar::Util module, available in your
+nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
+Or you may be able to use the following C<is_tainted()> function.
  
      sub is_tainted {
-       return ! eval {
-           join('',@_), kill 0;
-           1;
-       };
+        return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 };
      }
  
  This function makes use of the fact that the presence of tainted data
@@ -116,7 +149,8 @@ approach is used that if any tainted value has been accessed within the
  same expression, the whole expression is considered tainted.
  
  But testing for taintedness gets you only so far.  Sometimes you have just
-to clear your data's taintedness.  The only way to bypass the tainting
+to clear your data's taintedness.  Values may be untainted by using them
+as keys in a hash; otherwise the only way to bypass the tainting
  mechanism is by referencing subpatterns from a regular expression match.
  Perl presumes that if you reference a substring using $1, $2, etc., that
  you knew what you were doing when you wrote the pattern.  That means using
@@ -133,7 +167,7 @@ or a dot.
      if ($data =~ /^([-\@\w.]+)$/) {
         $data = $1;                     # $data now untainted
      } else {
-       die "Bad data in $data";        # log this somewhere
+       die "Bad data in '$data'";      # log this somewhere
      }
  
  This is fairly secure because C</\w+/> doesn't normally match shell
@@ -164,6 +198,26 @@ line, so you may need to use something like C<-wU> instead of C<-w -U>
  under such systems.  (This issue should arise only in Unix or
  Unix-like environments that support #! and setuid or setgid scripts.)
  
+=head2 Taint mode and @INC
+
+When the taint mode (C<-T>) is in effect, the "." directory is removed
+from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB>
+are ignored by Perl. You can still adjust C<@INC> from outside the
+program by using the C<-I> command line option as explained in
+L<perlrun>. The two environment variables are ignored because
+they are obscured, and a user running a program could be unaware that
+they are set, whereas the C<-I> option is clearly visible and
+therefore permitted.
+
+Another way to modify C<@INC> without modifying the program, is to use
+the C<lib> pragma, e.g.:
+
+  perl -Mlib=/foo program
+
+The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former
+will automagically remove any duplicated directories, while the later
+will not.
+
  =head2 Cleaning Up Your Path
  
  For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to a
@@ -217,25 +271,31 @@ not called with a string that the shell could expand.  This is by far the
  best way to call something that might be subjected to shell escapes: just
  never call the shell at all.  
  
-    use English;
-    die "Can't fork: $!" unless defined $pid = open(KID, "-|");
-    if ($pid) {                  # parent
-       while (<KID>) {
-           # do something
-       }
-       close KID;
-    } else {
-       my @temp = ($EUID, $EGID);
-       $EUID = $UID;
-       $EGID = $GID;    #      initgroups() also called!
-       # Make sure privs are really gone
-       ($EUID, $EGID) = @temp;
-       die "Can't drop privileges" 
-               unless $UID == $EUID  && $GID eq $EGID; 
-       $ENV{PATH} = "/bin:/usr/bin";
-       exec 'myprog', 'arg1', 'arg2' 
-           or die "can't exec myprog: $!";
-    }
+        use English '-no_match_vars';
+        die "Can't fork: $!" unless defined($pid = open(KID, "-|"));
+        if ($pid) {           # parent
+            while (<KID>) {
+                # do something
+            }
+            close KID;
+        } else {
+            my @temp     = ($EUID, $EGID);
+            my $orig_uid = $UID;
+            my $orig_gid = $GID;
+            $EUID = $UID;
+            $EGID = $GID;
+            # Drop privileges
+            $UID  = $orig_uid;
+            $GID  = $orig_gid;
+            # Make sure privs are really gone
+            ($EUID, $EGID) = @temp;
+            die "Can't drop privileges"
+                unless $UID == $EUID  && $GID eq $EGID;
+            $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH.
+           # Consider sanitizing the environment even more.
+            exec 'myprog', 'arg1', 'arg2'
+                or die "can't exec myprog: $!";
+        }
  
  A similar strategy would work for wildcard expansion via C<glob>, although
  you can use C<readdir> instead.
@@ -291,12 +351,6 @@ in C:
  Compile this wrapper into a binary executable and then make I<it> rather
  than your script setuid or setgid.
  
-See the program B<wrapsuid> in the F<eg> directory of your Perl
-distribution for a convenient way to do this automatically for all your
-setuid Perl programs.  It moves setuid scripts into files with the same
-name plus a leading dot, and then compiles a wrapper like the one above
-for each of them.
-
  In recent years, vendors have begun to supply systems free of this
  inherent security bug.  On such systems, when the kernel passes the name
  of the set-id script to open to the interpreter, rather than using a
@@ -308,9 +362,8 @@ program that builds Perl tries to figure this out for itself, so you
  should never have to specify this yourself.  Most modern releases of
  SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition.
  
-Prior to release 5.003 of Perl, a bug in the code of B<suidperl> could
-introduce a security hole in systems compiled with strict POSIX
-compliance.
+Prior to release 5.6.1 of Perl, bugs in the code of B<suidperl> could
+introduce a security hole.
  
  =head2 Protecting Your Programs
  
@@ -331,10 +384,11 @@ determine the insecure things and exploit them without viewing the
  source.  Security through obscurity, the name for hiding your bugs
  instead of fixing them, is little security indeed.
  
-You can try using encryption via source filters (Filter::* from CPAN).
-But crackers might be able to decrypt it.  You can try using the
-byte code compiler and interpreter described below, but crackers might
-be able to de-compile it.  You can try using the native-code compiler
+You can try using encryption via source filters (Filter::* from CPAN,
+or Filter::Util::Call and Filter::Simple since Perl 5.8).
+But crackers might be able to decrypt it.  You can try using the byte
+code compiler and interpreter described below, but crackers might be
+able to de-compile it.  You can try using the native-code compiler
  described below, but crackers might be able to disassemble it.  These
  pose varying degrees of difficulty to people wanting to get at your
  code, but none can definitively conceal it (this is true of every
@@ -348,6 +402,82 @@ Your access to it does not give you permission to use it blah blah
  blah."  You should see a lawyer to be sure your licence's wording will
  stand up in court.
  
+=head2 Unicode
+
+Unicode is a new and complex technology and one may easily overlook
+certain security pitfalls.  See L<perluniintro> for an overview and
+L<perlunicode> for details, and L<perlunicode/"Security Implications
+of Unicode"> for security implications in particular.
+
+=head2 Algorithmic Complexity Attacks
+
+Certain internal algorithms used in the implementation of Perl can
+be attacked by choosing the input carefully to consume large amounts
+of either time or space or both.  This can lead into the so-called
+I<Denial of Service> (DoS) attacks.
+
+=over 4
+
+=item *
+
+Hash Function - the algorithm used to "order" hash elements has been
+changed several times during the development of Perl, mainly to be
+reasonably fast.  In Perl 5.8.1 also the security aspect was taken
+into account.
+
+In Perls before 5.8.1 one could rather easily generate data that as
+hash keys would cause Perl to consume large amounts of time because
+internal structure of hashes would badly degenerate.  In Perl 5.8.1
+the hash function is randomly perturbed by a pseudorandom seed which
+makes generating such naughty hash keys harder.
+See L<perlrun/PERL_HASH_SEED> for more information.
+
+The random perturbation is done by default but if one wants for some
+reason emulate the old behaviour one can set the environment variable
+PERL_HASH_SEED to zero (or any other integer).  One possible reason
+for wanting to emulate the old behaviour is that in the new behaviour
+consecutive runs of Perl will order hash keys differently, which may
+confuse some applications (like Data::Dumper: the outputs of two
+different runs are no more identical).
+
+B<Perl has never guaranteed any ordering of the hash keys>, and the
+ordering has already changed several times during the lifetime of
+Perl 5.  Also, the ordering of hash keys has always been, and
+continues to be, affected by the insertion order.
+
+Also note that while the order of the hash elements might be
+randomised, this "pseudoordering" should B<not> be used for
+applications like shuffling a list randomly (use List::Util::shuffle()
+for that, see L<List::Util>, a standard core module since Perl 5.8.0;
+or the CPAN module Algorithm::Numerical::Shuffle), or for generating
+permutations (use e.g. the CPAN modules Algorithm::Permute or
+Algorithm::FastPermute), or for any cryptographic applications.
+
+=item *
+
+Regular expressions - Perl's regular expression engine is so called
+NFA (Non-Finite Automaton), which among other things means that it can
+rather easily consume large amounts of both time and space if the
+regular expression may match in several ways.  Careful crafting of the
+regular expressions can help but quite often there really isn't much
+one can do (the book "Mastering Regular Expressions" is required
+reading, see L<perlfaq2>).  Running out of space manifests itself by
+Perl running out of memory.
+
+=item *
+
+Sorting - the quicksort algorithm used in Perls before 5.8.0 to
+implement the sort() function is very easy to trick into misbehaving
+so that it consumes a lot of time.  Nothing more is required than
+resorting a list already sorted.  Starting from Perl 5.8.0 a different
+sorting algorithm, mergesort, is used.  Mergesort is insensitive to
+its input data, so it cannot be similarly fooled.
+
+=back
+
+See L<http://www.cs.rice.edu/~scrosby/hash/> for more information,
+and any computer science text book on the algorithmic complexity.
+
  =head1 SEE ALSO
  
  L<perlrun> for its description of cleaning up environment variables.