perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlsec - Perl security
	4
	5	=head1 DESCRIPTION
	6
	7	Perl is designed to make it easy to program securely even when running
	8	with extra privileges, like setuid or setgid programs. Unlike most
	9	command line shells, which are based on multiple substitution passes on
	10	each line of the script, Perl uses a more conventional evaluation scheme
	11	with fewer hidden snags. Additionally, because the language has more
	12	builtin functionality, it can rely less upon external (and possibly
	13	untrustworthy) programs to accomplish its purposes.
	14
	15	=head1 SECURITY VULNERABILITY CONTACT INFORMATION
	16
	17	If you believe you have found a security vulnerability in Perl, please email
	18	perl5-security-report@perl.org with details. This points to a closed
	19	subscription, unarchived mailing list. Please only use this address for
	20	security issues in the Perl core, not for modules independently distributed on
	21	CPAN.
	22
	23	=head1 SECURITY MECHANISMS AND CONCERNS
	24
	25	=head2 Taint mode
	26
	27	Perl automatically enables a set of special security checks, called I<taint
	28	mode>, when it detects its program running with differing real and effective
	29	user or group IDs. The setuid bit in Unix permissions is mode 04000, the
	30	setgid bit mode 02000; either or both may be set. You can also enable taint
	31	mode explicitly by using the B<-T> command line flag. This flag is
	32	I<strongly> suggested for server programs and any program run on behalf of
	33	someone else, such as a CGI script. Once taint mode is on, it's on for
	34	the remainder of your script.
	35
	36	While in this mode, Perl takes special precautions called I<taint
	37	checks> to prevent both obvious and subtle traps. Some of these checks
	38	are reasonably simple, such as verifying that path directories aren't
	39	writable by others; careful programmers have always used checks like
	40	these. Other checks, however, are best supported by the language itself,
	41	and it is these checks especially that contribute to making a set-id Perl
	42	program more secure than the corresponding C program.
	43
	44	You may not use data derived from outside your program to affect
	45	something else outside your program--at least, not by accident. All
	46	command line arguments, environment variables, locale information (see
	47	L<perllocale>), results of certain system calls (C<readdir()>,
	48	C<readlink()>, the variable of C<shmread()>, the messages returned by
	49	C<msgrcv()>, the password, gcos and shell fields returned by the
	50	C<getpwxxx()> calls), and all file input are marked as "tainted".
	51	Tainted data may not be used directly or indirectly in any command
	52	that invokes a sub-shell, nor in any command that modifies files,
	53	directories, or processes, B<with the following exceptions>:
	54
	55	=over 4
	56
	57	=item *
	58
	59	Arguments to C<print> and C<syswrite> are B<not> checked for taintedness.
	60
	61	=item *
	62
	63	Symbolic methods
	64
	65	$obj->$method(@args);
	66
	67	and symbolic sub references
	68
	69	&{$foo}(@args);
	70	$foo->(@args);
	71
	72	are not checked for taintedness. This requires extra carefulness
	73	unless you want external data to affect your control flow. Unless
	74	you carefully limit what these symbolic values are, people are able
	75	to call functions B<outside> your Perl code, such as POSIX::system,
	76	in which case they are able to run arbitrary external code.
	77
	78	=item *
	79
	80	Hash keys are B<never> tainted.
	81
	82	=back
	83
	84	For efficiency reasons, Perl takes a conservative view of
	85	whether data is tainted. If an expression contains tainted data,
	86	any subexpression may be considered tainted, even if the value
	87	of the subexpression is not itself affected by the tainted data.
	88
	89	Because taintedness is associated with each scalar value, some
	90	elements of an array or hash can be tainted and others not.
	91	The keys of a hash are B<never> tainted.
	92
	93	For example:
	94
	95	$arg = shift; # $arg is tainted
	96	$hid = $arg, 'bar'; # $hid is also tainted
	97	$line = <>; # Tainted
	98	$line = <STDIN>; # Also tainted
	99	open FOO, "/home/me/bar" or die $!;
	100	$line = <FOO>; # Still tainted
	101	$path = $ENV{'PATH'}; # Tainted, but see below
	102	$data = 'abc'; # Not tainted
	103
	104	system "echo $arg"; # Insecure
	105	system "/bin/echo", $arg; # Considered insecure
	106	# (Perl doesn't know about /bin/echo)
	107	system "echo $hid"; # Insecure
	108	system "echo $data"; # Insecure until PATH set
	109
	110	$path = $ENV{'PATH'}; # $path now tainted
	111
	112	$ENV{'PATH'} = '/bin:/usr/bin';
	113	delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
	114
	115	$path = $ENV{'PATH'}; # $path now NOT tainted
	116	system "echo $data"; # Is secure now!
	117
	118	open(FOO, "< $arg"); # OK - read-only file
	119	open(FOO, "> $arg"); # Not OK - trying to write
	120
	121	open(FOO,"echo $arg\|"); # Not OK
	122	open(FOO,"-\|")
	123	or exec 'echo', $arg; # Also not OK
	124
	125	$shout = `echo $arg`; # Insecure, $shout now tainted
	126
	127	unlink $data, $arg; # Insecure
	128	umask $arg; # Insecure
	129
	130	exec "echo $arg"; # Insecure
	131	exec "echo", $arg; # Insecure
	132	exec "sh", '-c', $arg; # Very insecure!
	133
	134	@files = <*.c>; # insecure (uses readdir() or similar)
	135	@files = glob('*.c'); # insecure (uses readdir() or similar)
	136
	137	# In Perl releases older than 5.6.0 the <.c> and glob('.c') would
	138	# have used an external program to do the filename expansion; but in
	139	# either case the result is tainted since the list of filenames comes
	140	# from outside of the program.
	141
	142	$bad = ($arg, 23); # $bad will be tainted
	143	$arg, `true`; # Insecure (although it isn't really)
	144
	145	If you try to do something insecure, you will get a fatal error saying
	146	something like "Insecure dependency" or "Insecure $ENV{PATH}".
	147
	148	The exception to the principle of "one tainted value taints the whole
	149	expression" is with the ternary conditional operator C<?:>. Since code
	150	with a ternary conditional
	151
	152	$result = $tainted_value ? "Untainted" : "Also untainted";
	153
	154	is effectively
	155
	156	if ( $tainted_value ) {
	157	$result = "Untainted";
	158	} else {
	159	$result = "Also untainted";
	160	}
	161
	162	it doesn't make sense for C<$result> to be tainted.
	163
	164	=head2 Laundering and Detecting Tainted Data
	165
	166	To test whether a variable contains tainted data, and whose use would
	167	thus trigger an "Insecure dependency" message, you can use the
	168	C<tainted()> function of the Scalar::Util module, available in your
	169	nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
	170	Or you may be able to use the following C<is_tainted()> function.
	171
	172	sub is_tainted {
	173	return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 };
	174	}
	175
	176	This function makes use of the fact that the presence of tainted data
	177	anywhere within an expression renders the entire expression tainted. It
	178	would be inefficient for every operator to test every argument for
	179	taintedness. Instead, the slightly more efficient and conservative
	180	approach is used that if any tainted value has been accessed within the
	181	same expression, the whole expression is considered tainted.
	182
	183	But testing for taintedness gets you only so far. Sometimes you have just
	184	to clear your data's taintedness. Values may be untainted by using them
	185	as keys in a hash; otherwise the only way to bypass the tainting
	186	mechanism is by referencing subpatterns from a regular expression match.
	187	Perl presumes that if you reference a substring using $1, $2, etc., that
	188	you knew what you were doing when you wrote the pattern. That means using
	189	a bit of thought--don't just blindly untaint anything, or you defeat the
	190	entire mechanism. It's better to verify that the variable has only good
	191	characters (for certain values of "good") rather than checking whether it
	192	has any bad characters. That's because it's far too easy to miss bad
	193	characters that you never thought of.
	194
	195	Here's a test to make sure that the data contains nothing but "word"
	196	characters (alphabetics, numerics, and underscores), a hyphen, an at sign,
	197	or a dot.
	198
	199	if ($data =~ /^([-\@\w.]+)$/) {
	200	$data = $1; # $data now untainted
	201	} else {
	202	die "Bad data in '$data'"; # log this somewhere
	203	}
	204
	205	This is fairly secure because C</\w+/> doesn't normally match shell
	206	metacharacters, nor are dot, dash, or at going to mean something special
	207	to the shell. Use of C</.+/> would have been insecure in theory because
	208	it lets everything through, but Perl doesn't check for that. The lesson
	209	is that when untainting, you must be exceedingly careful with your patterns.
	210	Laundering data using regular expression is the I<only> mechanism for
	211	untainting dirty data, unless you use the strategy detailed below to fork
	212	a child of lesser privilege.
	213
	214	The example does not untaint C<$data> if C<use locale> is in effect,
	215	because the characters matched by C<\w> are determined by the locale.
	216	Perl considers that locale definitions are untrustworthy because they
	217	contain data from outside the program. If you are writing a
	218	locale-aware program, and want to launder data with a regular expression
	219	containing C<\w>, put C<no locale> ahead of the expression in the same
	220	block. See L<perllocale/SECURITY> for further discussion and examples.
	221
	222	=head2 Switches On the "#!" Line
	223
	224	When you make a script executable, in order to make it usable as a
	225	command, the system will pass switches to perl from the script's #!
	226	line. Perl checks that any command line switches given to a setuid
	227	(or setgid) script actually match the ones set on the #! line. Some
	228	Unix and Unix-like environments impose a one-switch limit on the #!
	229	line, so you may need to use something like C<-wU> instead of C<-w -U>
	230	under such systems. (This issue should arise only in Unix or
	231	Unix-like environments that support #! and setuid or setgid scripts.)
	232
	233	=head2 Taint mode and @INC
	234
	235	When the taint mode (C<-T>) is in effect, the "." directory is removed
	236	from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB>
	237	are ignored by Perl. You can still adjust C<@INC> from outside the
	238	program by using the C<-I> command line option as explained in
	239	L<perlrun>. The two environment variables are ignored because
	240	they are obscured, and a user running a program could be unaware that
	241	they are set, whereas the C<-I> option is clearly visible and
	242	therefore permitted.
	243
	244	Another way to modify C<@INC> without modifying the program, is to use
	245	the C<lib> pragma, e.g.:
	246
	247	perl -Mlib=/foo program
	248
	249	The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former
	250	will automagically remove any duplicated directories, while the later
	251	will not.
	252
	253	Note that if a tainted string is added to C<@INC>, the following
	254	problem will be reported:
	255
	256	Insecure dependency in require while running with -T switch
	257
	258	=head2 Cleaning Up Your Path
	259
	260	For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to
	261	a known value, and each directory in the path must be absolute and
	262	non-writable by others than its owner and group. You may be surprised to
	263	get this message even if the pathname to your executable is fully
	264	qualified. This is I<not> generated because you didn't supply a full path
	265	to the program; instead, it's generated because you never set your PATH
	266	environment variable, or you didn't set it to something that was safe.
	267	Because Perl can't guarantee that the executable in question isn't itself
	268	going to turn around and execute some other program that is dependent on
	269	your PATH, it makes sure you set the PATH.
	270
	271	The PATH isn't the only environment variable which can cause problems.
	272	Because some shells may use the variables IFS, CDPATH, ENV, and
	273	BASH_ENV, Perl checks that those are either empty or untainted when
	274	starting subprocesses. You may wish to add something like this to your
	275	setid and taint-checking scripts.
	276
	277	delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer
	278
	279	It's also possible to get into trouble with other operations that don't
	280	care whether they use tainted values. Make judicious use of the file
	281	tests in dealing with any user-supplied filenames. When possible, do
	282	opens and such B<after> properly dropping any special user (or group!)
	283	privileges. Perl doesn't prevent you from opening tainted filenames for reading,
	284	so be careful what you print out. The tainting mechanism is intended to
	285	prevent stupid mistakes, not to remove the need for thought.
	286
	287	Perl does not call the shell to expand wild cards when you pass C<system>
	288	and C<exec> explicit parameter lists instead of strings with possible shell
	289	wildcards in them. Unfortunately, the C<open>, C<glob>, and
	290	backtick functions provide no such alternate calling convention, so more
	291	subterfuge will be required.
	292
	293	Perl provides a reasonably safe way to open a file or pipe from a setuid
	294	or setgid program: just create a child process with reduced privilege who
	295	does the dirty work for you. First, fork a child using the special
	296	C<open> syntax that connects the parent and child by a pipe. Now the
	297	child resets its ID set and any other per-process attributes, like
	298	environment variables, umasks, current working directories, back to the
	299	originals or known safe values. Then the child process, which no longer
	300	has any special permissions, does the C<open> or other system call.
	301	Finally, the child passes the data it managed to access back to the
	302	parent. Because the file or pipe was opened in the child while running
	303	under less privilege than the parent, it's not apt to be tricked into
	304	doing something it shouldn't.
	305
	306	Here's a way to do backticks reasonably safely. Notice how the C<exec> is
	307	not called with a string that the shell could expand. This is by far the
	308	best way to call something that might be subjected to shell escapes: just
	309	never call the shell at all.
	310
	311	use English '-no_match_vars';
	312	die "Can't fork: $!" unless defined($pid = open(KID, "-\|"));
	313	if ($pid) { # parent
	314	while (<KID>) {
	315	# do something
	316	}
	317	close KID;
	318	} else {
	319	my @temp = ($EUID, $EGID);
	320	my $orig_uid = $UID;
	321	my $orig_gid = $GID;
	322	$EUID = $UID;
	323	$EGID = $GID;
	324	# Drop privileges
	325	$UID = $orig_uid;
	326	$GID = $orig_gid;
	327	# Make sure privs are really gone
	328	($EUID, $EGID) = @temp;
	329	die "Can't drop privileges"
	330	unless $UID == $EUID && $GID eq $EGID;
	331	$ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH.
	332	# Consider sanitizing the environment even more.
	333	exec 'myprog', 'arg1', 'arg2'
	334	or die "can't exec myprog: $!";
	335	}
	336
	337	A similar strategy would work for wildcard expansion via C<glob>, although
	338	you can use C<readdir> instead.
	339
	340	Taint checking is most useful when although you trust yourself not to have
	341	written a program to give away the farm, you don't necessarily trust those
	342	who end up using it not to try to trick it into doing something bad. This
	343	is the kind of security checking that's useful for set-id programs and
	344	programs launched on someone else's behalf, like CGI programs.
	345
	346	This is quite different, however, from not even trusting the writer of the
	347	code not to try to do something evil. That's the kind of trust needed
	348	when someone hands you a program you've never seen before and says, "Here,
	349	run this." For that kind of safety, you might want to check out the Safe
	350	module, included standard in the Perl distribution. This module allows the
	351	programmer to set up special compartments in which all system operations
	352	are trapped and namespace access is carefully controlled. Safe should
	353	not be considered bullet-proof, though: it will not prevent the foreign
	354	code to set up infinite loops, allocate gigabytes of memory, or even
	355	abusing perl bugs to make the host interpreter crash or behave in
	356	unpredictable ways. In any case it's better avoided completely if you're
	357	really concerned about security.
	358
	359	=head2 Security Bugs
	360
	361	Beyond the obvious problems that stem from giving special privileges to
	362	systems as flexible as scripts, on many versions of Unix, set-id scripts
	363	are inherently insecure right from the start. The problem is a race
	364	condition in the kernel. Between the time the kernel opens the file to
	365	see which interpreter to run and when the (now-set-id) interpreter turns
	366	around and reopens the file to interpret it, the file in question may have
	367	changed, especially if you have symbolic links on your system.
	368
	369	Fortunately, sometimes this kernel "feature" can be disabled.
	370	Unfortunately, there are two ways to disable it. The system can simply
	371	outlaw scripts with any set-id bit set, which doesn't help much.
	372	Alternately, it can simply ignore the set-id bits on scripts.
	373
	374	However, if the kernel set-id script feature isn't disabled, Perl will
	375	complain loudly that your set-id script is insecure. You'll need to
	376	either disable the kernel set-id script feature, or put a C wrapper around
	377	the script. A C wrapper is just a compiled program that does nothing
	378	except call your Perl program. Compiled programs are not subject to the
	379	kernel bug that plagues set-id scripts. Here's a simple wrapper, written
	380	in C:
	381
	382	#define REAL_PATH "/path/to/script"
	383	main(ac, av)
	384	char **av;
	385	{
	386	execv(REAL_PATH, av);
	387	}
	388
	389	Compile this wrapper into a binary executable and then make I<it> rather
	390	than your script setuid or setgid.
	391
	392	In recent years, vendors have begun to supply systems free of this
	393	inherent security bug. On such systems, when the kernel passes the name
	394	of the set-id script to open to the interpreter, rather than using a
	395	pathname subject to meddling, it instead passes I</dev/fd/3>. This is a
	396	special file already opened on the script, so that there can be no race
	397	condition for evil scripts to exploit. On these systems, Perl should be
	398	compiled with C<-DSETUID_SCRIPTS_ARE_SECURE_NOW>. The F<Configure>
	399	program that builds Perl tries to figure this out for itself, so you
	400	should never have to specify this yourself. Most modern releases of
	401	SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition.
	402
	403	=head2 Protecting Your Programs
	404
	405	There are a number of ways to hide the source to your Perl programs,
	406	with varying levels of "security".
	407
	408	First of all, however, you I<can't> take away read permission, because
	409	the source code has to be readable in order to be compiled and
	410	interpreted. (That doesn't mean that a CGI script's source is
	411	readable by people on the web, though.) So you have to leave the
	412	permissions at the socially friendly 0755 level. This lets
	413	people on your local system only see your source.
	414
	415	Some people mistakenly regard this as a security problem. If your program does
	416	insecure things, and relies on people not knowing how to exploit those
	417	insecurities, it is not secure. It is often possible for someone to
	418	determine the insecure things and exploit them without viewing the
	419	source. Security through obscurity, the name for hiding your bugs
	420	instead of fixing them, is little security indeed.
	421
	422	You can try using encryption via source filters (Filter::* from CPAN,
	423	or Filter::Util::Call and Filter::Simple since Perl 5.8).
	424	But crackers might be able to decrypt it. You can try using the byte
	425	code compiler and interpreter described below, but crackers might be
	426	able to de-compile it. You can try using the native-code compiler
	427	described below, but crackers might be able to disassemble it. These
	428	pose varying degrees of difficulty to people wanting to get at your
	429	code, but none can definitively conceal it (this is true of every
	430	language, not just Perl).
	431
	432	If you're concerned about people profiting from your code, then the
	433	bottom line is that nothing but a restrictive license will give you
	434	legal security. License your software and pepper it with threatening
	435	statements like "This is unpublished proprietary software of XYZ Corp.
	436	Your access to it does not give you permission to use it blah blah
	437	blah." You should see a lawyer to be sure your license's wording will
	438	stand up in court.
	439
	440	=head2 Unicode
	441
	442	Unicode is a new and complex technology and one may easily overlook
	443	certain security pitfalls. See L<perluniintro> for an overview and
	444	L<perlunicode> for details, and L<perlunicode/"Security Implications
	445	of Unicode"> for security implications in particular.
	446
	447	=head2 Algorithmic Complexity Attacks
	448
	449	Certain internal algorithms used in the implementation of Perl can
	450	be attacked by choosing the input carefully to consume large amounts
	451	of either time or space or both. This can lead into the so-called
	452	I<Denial of Service> (DoS) attacks.
	453
	454	=over 4
	455
	456	=item *
	457
	458	Hash Function - the algorithm used to "order" hash elements has been
	459	changed several times during the development of Perl, mainly to be
	460	reasonably fast. In Perl 5.8.1 also the security aspect was taken
	461	into account.
	462
	463	In Perls before 5.8.1 one could rather easily generate data that as
	464	hash keys would cause Perl to consume large amounts of time because
	465	internal structure of hashes would badly degenerate. In Perl 5.8.1
	466	the hash function is randomly perturbed by a pseudorandom seed which
	467	makes generating such naughty hash keys harder.
	468	See L<perlrun/PERL_HASH_SEED> for more information.
	469
	470	In Perl 5.8.1 the random perturbation was done by default, but as of
	471	5.8.2 it is only used on individual hashes if the internals detect the
	472	insertion of pathological data. If one wants for some reason emulate the
	473	old behaviour (and expose oneself to DoS attacks) one can set the
	474	environment variable PERL_HASH_SEED to zero to disable the protection
	475	(or any other integer to force a known perturbation, rather than random).
	476	One possible reason for wanting to emulate the old behaviour is that in the
	477	new behaviour consecutive runs of Perl will order hash keys differently,
	478	which may confuse some applications (like Data::Dumper: the outputs of two
	479	different runs are no longer identical).
	480
	481	B<Perl has never guaranteed any ordering of the hash keys>, and the
	482	ordering has already changed several times during the lifetime of
	483	Perl 5. Also, the ordering of hash keys has always been, and
	484	continues to be, affected by the insertion order.
	485
	486	Also note that while the order of the hash elements might be
	487	randomised, this "pseudoordering" should B<not> be used for
	488	applications like shuffling a list randomly (use List::Util::shuffle()
	489	for that, see L<List::Util>, a standard core module since Perl 5.8.0;
	490	or the CPAN module Algorithm::Numerical::Shuffle), or for generating
	491	permutations (use e.g. the CPAN modules Algorithm::Permute or
	492	Algorithm::FastPermute), or for any cryptographic applications.
	493
	494	=item *
	495
	496	Regular expressions - Perl's regular expression engine is so called NFA
	497	(Non-deterministic Finite Automaton), which among other things means that
	498	it can rather easily consume large amounts of both time and space if the
	499	regular expression may match in several ways. Careful crafting of the
	500	regular expressions can help but quite often there really isn't much
	501	one can do (the book "Mastering Regular Expressions" is required
	502	reading, see L<perlfaq2>). Running out of space manifests itself by
	503	Perl running out of memory.
	504
	505	=item *
	506
	507	Sorting - the quicksort algorithm used in Perls before 5.8.0 to
	508	implement the sort() function is very easy to trick into misbehaving
	509	so that it consumes a lot of time. Starting from Perl 5.8.0 a different
	510	sorting algorithm, mergesort, is used by default. Mergesort cannot
	511	misbehave on any input.
	512
	513	=back
	514
	515	See L<http://www.cs.rice.edu/~scrosby/hash/> for more information,
	516	and any computer science textbook on algorithmic complexity.
	517
	518	=head1 SEE ALSO
	519
	520	L<perlrun> for its description of cleaning up environment variables.