[perl5.git] / cpan / Digest / Digest.pm

package Digest;

use strict;
use vars qw($VERSION %MMAP $AUTOLOAD);

$VERSION = "1.17";

%MMAP = (
  "SHA-1"      => [["Digest::SHA", 1], "Digest::SHA1", ["Digest::SHA2", 1]],
  "SHA-224"    => [["Digest::SHA", 224]],
  "SHA-256"    => [["Digest::SHA", 256], ["Digest::SHA2", 256]],
  "SHA-384"    => [["Digest::SHA", 384], ["Digest::SHA2", 384]],
  "SHA-512"    => [["Digest::SHA", 512], ["Digest::SHA2", 512]],
  "HMAC-MD5"   => "Digest::HMAC_MD5",
  "HMAC-SHA-1" => "Digest::HMAC_SHA1",
  "CRC-16"     => [["Digest::CRC", type => "crc16"]],
  "CRC-32"     => [["Digest::CRC", type => "crc32"]],
  "CRC-CCITT"  => [["Digest::CRC", type => "crcccitt"]],
  "RIPEMD-160" => "Crypt::RIPEMD160",
);

sub new
{
    shift;  # class ignored
    my $algorithm = shift;
    my $impl = $MMAP{$algorithm} || do {
        $algorithm =~ s/\W+//g;
        "Digest::$algorithm";
    };
    $impl = [$impl] unless ref($impl);
    local $@;  # don't clobber it for our caller
    my $err;
    for  (@$impl) {
        my $class = $_;
        my @args;
        ($class, @args) = @$class if ref($class);
        no strict 'refs';
        unless (exists ${"$class\::"}{"VERSION"}) {
            my $pm_file = $class . ".pm";
            $pm_file =~ s{::}{/}g;
            eval { require $pm_file };
            if ($@) {
                $err ||= $@;
                next;
            }
        }
        return $class->new(@args, @_);
    }
    die $err;
}

sub AUTOLOAD
{
    my $class = shift;
    my $algorithm = substr($AUTOLOAD, rindex($AUTOLOAD, '::')+2);
    $class->new($algorithm, @_);
}

1;

__END__

=head1 NAME

Digest - Modules that calculate message digests

=head1 SYNOPSIS

  $md5  = Digest->new("MD5");
  $sha1 = Digest->new("SHA-1");
  $sha256 = Digest->new("SHA-256");
  $sha384 = Digest->new("SHA-384");
  $sha512 = Digest->new("SHA-512");

  $hmac = Digest->HMAC_MD5($key);

=head1 DESCRIPTION

The C<Digest::> modules calculate digests, also called "fingerprints"
or "hashes", of some data, called a message.  The digest is (usually)
some small/fixed size string.  The actual size of the digest depend of
the algorithm used.  The message is simply a sequence of arbitrary
bytes or bits.

An important property of the digest algorithms is that the digest is
I<likely> to change if the message change in some way.  Another
property is that digest functions are one-way functions, that is it
should be I<hard> to find a message that correspond to some given
digest.  Algorithms differ in how "likely" and how "hard", as well as
how efficient they are to compute.

Note that the properties of the algorithms change over time, as the
algorithms are analyzed and machines grow faster.  If your application
for instance depends on it being "impossible" to generate the same
digest for a different message it is wise to make it easy to plug in
stronger algorithms as the one used grow weaker.  Using the interface
documented here should make it easy to change algorithms later.

All C<Digest::> modules provide the same programming interface.  A
functional interface for simple use, as well as an object oriented
interface that can handle messages of arbitrary length and which can
read files directly.

The digest can be delivered in three formats:

=over 8

=item I<binary>

This is the most compact form, but it is not well suited for printing
or embedding in places that can't handle arbitrary data.

=item I<hex>

A twice as long string of lowercase hexadecimal digits.

=item I<base64>

A string of portable printable characters.  This is the base64 encoded
representation of the digest with any trailing padding removed.  The
string will be about 30% longer than the binary version.
L<MIME::Base64> tells you more about this encoding.

=back


The functional interface is simply importable functions with the same
name as the algorithm.  The functions take the message as argument and
return the digest.  Example:

  use Digest::MD5 qw(md5);
  $digest = md5($message);

There are also versions of the functions with "_hex" or "_base64"
appended to the name, which returns the digest in the indicated form.

=head1 OO INTERFACE

The following methods are available for all C<Digest::> modules:

=over 4

=item $ctx = Digest->XXX($arg,...)

=item $ctx = Digest->new(XXX => $arg,...)

=item $ctx = Digest::XXX->new($arg,...)

The constructor returns some object that encapsulate the state of the
message-digest algorithm.  You can add data to the object and finally
ask for the digest.  The "XXX" should of course be replaced by the proper
name of the digest algorithm you want to use.

The two first forms are simply syntactic sugar which automatically
load the right module on first use.  The second form allow you to use
algorithm names which contains letters which are not legal perl
identifiers, e.g. "SHA-1".  If no implementation for the given algorithm
can be found, then an exception is raised.

If new() is called as an instance method (i.e. $ctx->new) it will just
reset the state the object to the state of a newly created object.  No
new object is created in this case, and the return value is the
reference to the object (i.e. $ctx).

=item $other_ctx = $ctx->clone

The clone method creates a copy of the digest state object and returns
a reference to the copy.

=item $ctx->reset

This is just an alias for $ctx->new.

=item $ctx->add( $data )

=item $ctx->add( $chunk1, $chunk2, ... )

The string value of the $data provided as argument is appended to the
message we calculate the digest for.  The return value is the $ctx
object itself.

If more arguments are provided then they are all appended to the
message, thus all these lines will have the same effect on the state
of the $ctx object:

  $ctx->add("a"); $ctx->add("b"); $ctx->add("c");
  $ctx->add("a")->add("b")->add("c");
  $ctx->add("a", "b", "c");
  $ctx->add("abc");

Most algorithms are only defined for strings of bytes and this method
might therefore croak if the provided arguments contain chars with
ordinal number above 255.

=item $ctx->addfile( $io_handle )

The $io_handle is read until EOF and the content is appended to the
message we calculate the digest for.  The return value is the $ctx
object itself.

The addfile() method will croak() if it fails reading data for some
reason.  If it croaks it is unpredictable what the state of the $ctx
object will be in. The addfile() method might have been able to read
the file partially before it failed.  It is probably wise to discard
or reset the $ctx object if this occurs.

In most cases you want to make sure that the $io_handle is in
"binmode" before you pass it as argument to the addfile() method.

=item $ctx->add_bits( $data, $nbits )

=item $ctx->add_bits( $bitstring )

The add_bits() method is an alternative to add() that allow partial
bytes to be appended to the message.  Most users should just ignore
this method as partial bytes is very unlikely to be of any practical
use.

The two argument form of add_bits() will add the first $nbits bits
from $data.  For the last potentially partial byte only the high order
C<< $nbits % 8 >> bits are used.  If $nbits is greater than C<<
length($data) * 8 >>, then this method would do the same as C<<
$ctx->add($data) >>.

The one argument form of add_bits() takes a $bitstring of "1" and "0"
chars as argument.  It's a shorthand for C<< $ctx->add_bits(pack("B*",
$bitstring), length($bitstring)) >>.

The return value is the $ctx object itself.

This example shows two calls that should have the same effect:

   $ctx->add_bits("111100001010");
   $ctx->add_bits("\xF0\xA0", 12);

Most digest algorithms are byte based and for these it is not possible
to add bits that are not a multiple of 8, and the add_bits() method
will croak if you try.

=item $ctx->digest

Return the binary digest for the message.

Note that the C<digest> operation is effectively a destructive,
read-once operation. Once it has been performed, the $ctx object is
automatically C<reset> and can be used to calculate another digest
value.  Call $ctx->clone->digest if you want to calculate the digest
without resetting the digest state.

=item $ctx->hexdigest

Same as $ctx->digest, but will return the digest in hexadecimal form.

=item $ctx->b64digest

Same as $ctx->digest, but will return the digest as a base64 encoded
string.

=back

=head1 Digest speed

This table should give some indication on the relative speed of
different algorithms.  It is sorted by throughput based on a benchmark
done with of some implementations of this API:

 Algorithm      Size    Implementation                  MB/s

 MD4            128     Digest::MD4 v1.3               165.0
 MD5            128     Digest::MD5 v2.33               98.8
 SHA-256        256     Digest::SHA2 v1.1.0             66.7
 SHA-1          160     Digest::SHA v4.3.1              58.9
 SHA-1          160     Digest::SHA1 v2.10              48.8
 SHA-256        256     Digest::SHA v4.3.1              41.3
 Haval-256      256     Digest::Haval256 v1.0.4         39.8
 SHA-384        384     Digest::SHA2 v1.1.0             19.6
 SHA-512        512     Digest::SHA2 v1.1.0             19.3
 SHA-384        384     Digest::SHA v4.3.1              19.2
 SHA-512        512     Digest::SHA v4.3.1              19.2
 Whirlpool      512     Digest::Whirlpool v1.0.2        13.0
 MD2            128     Digest::MD2 v2.03                9.5

 Adler-32        32     Digest::Adler32 v0.03            1.3
 CRC-16          16     Digest::CRC v0.05                1.1
 CRC-32          32     Digest::CRC v0.05                1.1
 MD5            128     Digest::Perl::MD5 v1.5           1.0
 CRC-CCITT       16     Digest::CRC v0.05                0.8

These numbers was achieved Apr 2004 with ActivePerl-5.8.3 running
under Linux on a P4 2.8 GHz CPU.  The last 5 entries differ by being
pure perl implementations of the algorithms, which explains why they
are so slow.

=head1 SEE ALSO

L<Digest::Adler32>, L<Digest::CRC>, L<Digest::Haval256>,
L<Digest::HMAC>, L<Digest::MD2>, L<Digest::MD4>, L<Digest::MD5>,
L<Digest::SHA>, L<Digest::SHA1>, L<Digest::SHA2>, L<Digest::Whirlpool>

New digest implementations should consider subclassing from L<Digest::base>.

L<MIME::Base64>

http://en.wikipedia.org/wiki/Cryptographic_hash_function

=head1 AUTHOR

Gisle Aas <gisle@aas.no>

The C<Digest::> interface is based on the interface originally
developed by Neil Winton for his C<MD5> module.

This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

    Copyright 1998-2006 Gisle Aas.
    Copyright 1995,1996 Neil Winton.

=cut
Commit	Line	Data
3357b1b1 JH	1	package Digest;
	2
	3	use strict;
	4	use vars qw($VERSION %MMAP $AUTOLOAD);
	5
a2fa999d	6	$VERSION = "1.17";
3357b1b1 JH	7
3357b1b1 JH	8	%MMAP = (
23be15b7	9	"SHA-1" => [["Digest::SHA", 1], "Digest::SHA1", ["Digest::SHA2", 1]],
3cea4b92	10	"SHA-224" => [["Digest::SHA", 224]],
b12d758c NC	11	"SHA-256" => [["Digest::SHA", 256], ["Digest::SHA2", 256]],
	12	"SHA-384" => [["Digest::SHA", 384], ["Digest::SHA2", 384]],
	13	"SHA-512" => [["Digest::SHA", 512], ["Digest::SHA2", 512]],
3357b1b1 JH	14	"HMAC-MD5" => "Digest::HMAC_MD5",
3357b1b1 JH	15	"HMAC-SHA-1" => "Digest::HMAC_SHA1",
371dcd31 RGS	16	"CRC-16" => [["Digest::CRC", type => "crc16"]],
	17	"CRC-32" => [["Digest::CRC", type => "crc32"]],
	18	"CRC-CCITT" => [["Digest::CRC", type => "crcccitt"]],
a2fa999d	19	"RIPEMD-160" => "Crypt::RIPEMD160",
3357b1b1 JH	20	);
	21
	22	sub new
	23	{
	24	shift; # class ignored
	25	my $algorithm = shift;
b12d758c	26	my $impl = $MMAP{$algorithm} \|\| do {
a2fa999d CBW	27	$algorithm =~ s/\W+//g;
a2fa999d CBW	28	"Digest::$algorithm";
b12d758c NC	29	};
b12d758c NC	30	$impl = [$impl] unless ref($impl);
a2fa999d	31	local $@; # don't clobber it for our caller
b12d758c NC	32	my $err;
b12d758c NC	33	for (@$impl) {
a2fa999d CBW	34	my $class = $_;
	35	my @args;
	36	($class, @args) = @$class if ref($class);
	37	no strict 'refs';
	38	unless (exists ${"$class\::"}{"VERSION"}) {
	39	my $pm_file = $class . ".pm";
	40	$pm_file =~ s{::}{/}g;
	41	eval { require $pm_file };
	42	if ($@) {
	43	$err \|\|= $@;
	44	next;
	45	}
	46	}
	47	return $class->new(@args, @_);
3357b1b1	48	}
b12d758c	49	die $err;
3357b1b1 JH	50	}
	51
	52	sub AUTOLOAD
	53	{
	54	my $class = shift;
	55	my $algorithm = substr($AUTOLOAD, rindex($AUTOLOAD, '::')+2);
	56	$class->new($algorithm, @_);
	57	}
	58
	59	1;
	60
	61	__END__
	62
	63	=head1 NAME
	64
e19eb3c1	65	Digest - Modules that calculate message digests
3357b1b1 JH	66
	67	=head1 SYNOPSIS
	68
e19eb3c1	69	$md5 = Digest->new("MD5");
3357b1b1	70	$sha1 = Digest->new("SHA-1");
e19eb3c1 NC	71	$sha256 = Digest->new("SHA-256");
	72	$sha384 = Digest->new("SHA-384");
	73	$sha512 = Digest->new("SHA-512");
3357b1b1 JH	74
	75	$hmac = Digest->HMAC_MD5($key);
	76
	77	=head1 DESCRIPTION
	78
	79	The C<Digest::> modules calculate digests, also called "fingerprints"
	80	or "hashes", of some data, called a message. The digest is (usually)
	81	some small/fixed size string. The actual size of the digest depend of
	82	the algorithm used. The message is simply a sequence of arbitrary
b12d758c	83	bytes or bits.
3357b1b1 JH	84
	85	An important property of the digest algorithms is that the digest is
	86	I<likely> to change if the message change in some way. Another
ec81b1ec	87	property is that digest functions are one-way functions, that is it
3357b1b1 JH	88	should be I<hard> to find a message that correspond to some given
	89	digest. Algorithms differ in how "likely" and how "hard", as well as
	90	how efficient they are to compute.
	91
ec81b1ec SP	92	Note that the properties of the algorithms change over time, as the
	93	algorithms are analyzed and machines grow faster. If your application
	94	for instance depends on it being "impossible" to generate the same
	95	digest for a different message it is wise to make it easy to plug in
	96	stronger algorithms as the one used grow weaker. Using the interface
	97	documented here should make it easy to change algorithms later.
	98
3357b1b1 JH	99	All C<Digest::> modules provide the same programming interface. A
	100	functional interface for simple use, as well as an object oriented
	101	interface that can handle messages of arbitrary length and which can
	102	read files directly.
	103
	104	The digest can be delivered in three formats:
	105
	106	=over 8
	107
	108	=item I<binary>
	109
	110	This is the most compact form, but it is not well suited for printing
	111	or embedding in places that can't handle arbitrary data.
	112
	113	=item I<hex>
	114
e19eb3c1	115	A twice as long string of lowercase hexadecimal digits.
3357b1b1 JH	116
	117	=item I<base64>
	118
	119	A string of portable printable characters. This is the base64 encoded
	120	representation of the digest with any trailing padding removed. The
	121	string will be about 30% longer than the binary version.
	122	L<MIME::Base64> tells you more about this encoding.
	123
	124	=back
	125
	126
	127	The functional interface is simply importable functions with the same
	128	name as the algorithm. The functions take the message as argument and
	129	return the digest. Example:
	130
	131	use Digest::MD5 qw(md5);
	132	$digest = md5($message);
	133
	134	There are also versions of the functions with "_hex" or "_base64"
	135	appended to the name, which returns the digest in the indicated form.
	136
	137	=head1 OO INTERFACE
	138
	139	The following methods are available for all C<Digest::> modules:
	140
	141	=over 4
	142
	143	=item $ctx = Digest->XXX($arg,...)
	144
	145	=item $ctx = Digest->new(XXX => $arg,...)
	146
	147	=item $ctx = Digest::XXX->new($arg,...)
	148
	149	The constructor returns some object that encapsulate the state of the
	150	message-digest algorithm. You can add data to the object and finally
	151	ask for the digest. The "XXX" should of course be replaced by the proper
	152	name of the digest algorithm you want to use.
	153
	154	The two first forms are simply syntactic sugar which automatically
	155	load the right module on first use. The second form allow you to use
	156	algorithm names which contains letters which are not legal perl
897ff129 RGS	157	identifiers, e.g. "SHA-1". If no implementation for the given algorithm
897ff129 RGS	158	can be found, then an exception is raised.
3357b1b1	159
67859229	160	If new() is called as an instance method (i.e. $ctx->new) it will just
3357b1b1 JH	161	reset the state the object to the state of a newly created object. No
	162	new object is created in this case, and the return value is the
	163	reference to the object (i.e. $ctx).
	164
70ee4409 JH	165	=item $other_ctx = $ctx->clone
	166
	167	The clone method creates a copy of the digest state object and returns
	168	a reference to the copy.
	169
3357b1b1 JH	170	=item $ctx->reset
	171
	172	This is just an alias for $ctx->new.
	173
5e50d565	174	=item $ctx->add( $data )
3357b1b1	175
5e50d565 GA	176	=item $ctx->add( $chunk1, $chunk2, ... )
	177
	178	The string value of the $data provided as argument is appended to the
	179	message we calculate the digest for. The return value is the $ctx
	180	object itself.
	181
	182	If more arguments are provided then they are all appended to the
	183	message, thus all these lines will have the same effect on the state
	184	of the $ctx object:
	185
	186	$ctx->add("a"); $ctx->add("b"); $ctx->add("c");
	187	$ctx->add("a")->add("b")->add("c");
	188	$ctx->add("a", "b", "c");
	189	$ctx->add("abc");
	190
	191	Most algorithms are only defined for strings of bytes and this method
	192	might therefore croak if the provided arguments contain chars with
	193	ordinal number above 255.
3357b1b1	194
e19eb3c1	195	=item $ctx->addfile( $io_handle )
3357b1b1 JH	196
	197	The $io_handle is read until EOF and the content is appended to the
	198	message we calculate the digest for. The return value is the $ctx
	199	object itself.
	200
5e50d565 GA	201	The addfile() method will croak() if it fails reading data for some
	202	reason. If it croaks it is unpredictable what the state of the $ctx
	203	object will be in. The addfile() method might have been able to read
	204	the file partially before it failed. It is probably wise to discard
	205	or reset the $ctx object if this occurs.
	206
	207	In most cases you want to make sure that the $io_handle is in
	208	"binmode" before you pass it as argument to the addfile() method.
	209
e19eb3c1	210	=item $ctx->add_bits( $data, $nbits )
b12d758c	211
e19eb3c1	212	=item $ctx->add_bits( $bitstring )
b12d758c	213
5e50d565 GA	214	The add_bits() method is an alternative to add() that allow partial
	215	bytes to be appended to the message. Most users should just ignore
	216	this method as partial bytes is very unlikely to be of any practical
	217	use.
b12d758c NC	218
b12d758c NC	219	The two argument form of add_bits() will add the first $nbits bits
5e50d565	220	from $data. For the last potentially partial byte only the high order
b12d758c NC	221	C<< $nbits % 8 >> bits are used. If $nbits is greater than C<<
b12d758c NC	222	length($data) * 8 >>, then this method would do the same as C<<
5e50d565	223	$ctx->add($data) >>.
b12d758c NC	224
	225	The one argument form of add_bits() takes a $bitstring of "1" and "0"
	226	chars as argument. It's a shorthand for C<< $ctx->add_bits(pack("B*",
	227	$bitstring), length($bitstring)) >>.
	228
5e50d565 GA	229	The return value is the $ctx object itself.
5e50d565 GA	230
b12d758c NC	231	This example shows two calls that should have the same effect:
	232
	233	$ctx->add_bits("111100001010");
	234	$ctx->add_bits("\xF0\xA0", 12);
	235
5e50d565	236	Most digest algorithms are byte based and for these it is not possible
b12d758c NC	237	to add bits that are not a multiple of 8, and the add_bits() method
	238	will croak if you try.
	239
3357b1b1 JH	240	=item $ctx->digest
	241
	242	Return the binary digest for the message.
	243
	244	Note that the C<digest> operation is effectively a destructive,
	245	read-once operation. Once it has been performed, the $ctx object is
	246	automatically C<reset> and can be used to calculate another digest
70ee4409	247	value. Call $ctx->clone->digest if you want to calculate the digest
3c4b39be	248	without resetting the digest state.
3357b1b1 JH	249
	250	=item $ctx->hexdigest
	251
	252	Same as $ctx->digest, but will return the digest in hexadecimal form.
	253
	254	=item $ctx->b64digest
	255
	256	Same as $ctx->digest, but will return the digest as a base64 encoded
	257	string.
	258
	259	=back
	260
e19eb3c1 NC	261	=head1 Digest speed
	262
	263	This table should give some indication on the relative speed of
	264	different algorithms. It is sorted by throughput based on a benchmark
	265	done with of some implementations of this API:
	266
371dcd31 RGS	267	Algorithm Size Implementation MB/s
	268
	269	MD4 128 Digest::MD4 v1.3 165.0
	270	MD5 128 Digest::MD5 v2.33 98.8
	271	SHA-256 256 Digest::SHA2 v1.1.0 66.7
	272	SHA-1 160 Digest::SHA v4.3.1 58.9
	273	SHA-1 160 Digest::SHA1 v2.10 48.8
	274	SHA-256 256 Digest::SHA v4.3.1 41.3
	275	Haval-256 256 Digest::Haval256 v1.0.4 39.8
	276	SHA-384 384 Digest::SHA2 v1.1.0 19.6
	277	SHA-512 512 Digest::SHA2 v1.1.0 19.3
	278	SHA-384 384 Digest::SHA v4.3.1 19.2
	279	SHA-512 512 Digest::SHA v4.3.1 19.2
	280	Whirlpool 512 Digest::Whirlpool v1.0.2 13.0
	281	MD2 128 Digest::MD2 v2.03 9.5
	282
	283	Adler-32 32 Digest::Adler32 v0.03 1.3
	284	CRC-16 16 Digest::CRC v0.05 1.1
	285	CRC-32 32 Digest::CRC v0.05 1.1
	286	MD5 128 Digest::Perl::MD5 v1.5 1.0
	287	CRC-CCITT 16 Digest::CRC v0.05 0.8
	288
	289	These numbers was achieved Apr 2004 with ActivePerl-5.8.3 running
	290	under Linux on a P4 2.8 GHz CPU. The last 5 entries differ by being
e19eb3c1 NC	291	pure perl implementations of the algorithms, which explains why they
	292	are so slow.
	293
3357b1b1 JH	294	=head1 SEE ALSO
3357b1b1 JH	295
371dcd31 RGS	296	L<Digest::Adler32>, L<Digest::CRC>, L<Digest::Haval256>,
	297	L<Digest::HMAC>, L<Digest::MD2>, L<Digest::MD4>, L<Digest::MD5>,
	298	L<Digest::SHA>, L<Digest::SHA1>, L<Digest::SHA2>, L<Digest::Whirlpool>
e19eb3c1 NC	299
e19eb3c1 NC	300	New digest implementations should consider subclassing from L<Digest::base>.
3357b1b1 JH	301
	302	L<MIME::Base64>
	303
5e50d565 GA	304	http://en.wikipedia.org/wiki/Cryptographic_hash_function
5e50d565 GA	305
3357b1b1 JH	306	=head1 AUTHOR
	307
	308	Gisle Aas <gisle@aas.no>
	309
	310	The C<Digest::> interface is based on the interface originally
	311	developed by Neil Winton for his C<MD5> module.
	312
e19eb3c1 NC	313	This library is free software; you can redistribute it and/or
	314	modify it under the same terms as Perl itself.
	315
5e50d565 GA	316	Copyright 1998-2006 Gisle Aas.
5e50d565 GA	317	Copyright 1995,1996 Neil Winton.
e19eb3c1	318
3357b1b1	319	=cut