1 package IO::Uncompress::Bunzip2 ;
7 use IO::Compress::Base::Common 2.061 qw(:Status );
9 use IO::Uncompress::Base 2.061 ;
10 use IO::Uncompress::Adapter::Bunzip2 2.061 ;
13 our ($VERSION, @ISA, @EXPORT_OK, %EXPORT_TAGS, $Bunzip2Error);
18 @ISA = qw( Exporter IO::Uncompress::Base );
19 @EXPORT_OK = qw( $Bunzip2Error bunzip2 ) ;
20 #%EXPORT_TAGS = %IO::Uncompress::Base::EXPORT_TAGS ;
21 push @{ $EXPORT_TAGS{all} }, @EXPORT_OK ;
22 #Exporter::export_ok_tags('all');
28 my $obj = IO::Compress::Base::Common::createSelfTiedObject($class, \$Bunzip2Error);
30 $obj->_create(undef, 0, @_);
35 my $obj = IO::Compress::Base::Common::createSelfTiedObject(undef, \$Bunzip2Error);
36 return $obj->_inf(@_);
42 'verbosity' => [IO::Compress::Base::Common::Parse_boolean, 0],
43 'small' => [IO::Compress::Base::Common::Parse_boolean, 0],
61 my $magic = $self->ckMagic()
64 *$self->{Info} = $self->readHeader($magic)
67 my $Small = $got->getValue('small');
68 my $Verbosity = $got->getValue('verbosity');
70 my ($obj, $errstr, $errno) = IO::Uncompress::Adapter::Bunzip2::mkUncompObject(
73 return $self->saveErrorString(undef, $errstr, $errno)
76 *$self->{Uncomp} = $obj;
88 $self->smartReadExact(\$magic, 4);
90 *$self->{HeaderPending} = $magic ;
92 return $self->HeaderError("Header size is " .
94 if length $magic != 4;
96 return $self->HeaderError("Bad Magic.")
97 if ! isBzip2Magic($magic) ;
100 *$self->{Type} = 'bzip2';
109 $self->pushBack($magic);
110 *$self->{HeaderPending} = '';
115 'FingerprintLength' => 4,
117 'TrailerLength' => 0,
133 return $buffer =~ /^BZh\d$/;
143 IO::Uncompress::Bunzip2 - Read bzip2 files/buffers
147 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
149 my $status = bunzip2 $input => $output [,OPTS]
150 or die "bunzip2 failed: $Bunzip2Error\n";
152 my $z = new IO::Uncompress::Bunzip2 $input [OPTS]
153 or die "bunzip2 failed: $Bunzip2Error\n";
155 $status = $z->read($buffer)
156 $status = $z->read($buffer, $length)
157 $status = $z->read($buffer, $length, $offset)
158 $line = $z->getline()
163 $data = $z->trailingData()
164 $status = $z->nextStream()
165 $data = $z->getHeaderInfo()
167 $z->seek($position, $whence)
179 read($z, $buffer, $length);
180 read($z, $buffer, $length, $offset);
182 seek($z, $position, $whence)
190 This module provides a Perl interface that allows the reading of
193 For writing bzip2 files/buffers, see the companion module IO::Compress::Bzip2.
195 =head1 Functional Interface
197 A top-level function, C<bunzip2>, is provided to carry out
198 "one-shot" uncompression between buffers and/or files. For finer
199 control over the uncompression process, see the L</"OO Interface">
202 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
204 bunzip2 $input => $output [,OPTS]
205 or die "bunzip2 failed: $Bunzip2Error\n";
207 The functional interface needs Perl5.005 or better.
209 =head2 bunzip2 $input => $output [, OPTS]
211 C<bunzip2> expects at least two parameters, C<$input> and C<$output>.
213 =head3 The C<$input> parameter
215 The parameter, C<$input>, is used to define the source of
218 It can take one of the following forms:
224 If the C<$input> parameter is a simple scalar, it is assumed to be a
225 filename. This file will be opened for reading and the input data
226 will be read from it.
230 If the C<$input> parameter is a filehandle, the input data will be
232 The string '-' can be used as an alias for standard input.
234 =item A scalar reference
236 If C<$input> is a scalar reference, the input data will be read
239 =item An array reference
241 If C<$input> is an array reference, each element in the array must be a
244 The input data will be read from each file in turn.
246 The complete array will be walked to ensure that it only
247 contains valid filenames before any data is uncompressed.
249 =item An Input FileGlob string
251 If C<$input> is a string that is delimited by the characters "<" and ">"
252 C<bunzip2> will assume that it is an I<input fileglob string>. The
253 input is the list of files that match the fileglob.
255 If the fileglob does not match any files ...
257 See L<File::GlobMapper|File::GlobMapper> for more details.
261 If the C<$input> parameter is any other type, C<undef> will be returned.
263 =head3 The C<$output> parameter
265 The parameter C<$output> is used to control the destination of the
266 uncompressed data. This parameter can take one of these forms.
272 If the C<$output> parameter is a simple scalar, it is assumed to be a
273 filename. This file will be opened for writing and the uncompressed
274 data will be written to it.
278 If the C<$output> parameter is a filehandle, the uncompressed data
279 will be written to it.
280 The string '-' can be used as an alias for standard output.
282 =item A scalar reference
284 If C<$output> is a scalar reference, the uncompressed data will be
285 stored in C<$$output>.
287 =item An Array Reference
289 If C<$output> is an array reference, the uncompressed data will be
290 pushed onto the array.
292 =item An Output FileGlob
294 If C<$output> is a string that is delimited by the characters "<" and ">"
295 C<bunzip2> will assume that it is an I<output fileglob string>. The
296 output is the list of files that match the fileglob.
298 When C<$output> is an fileglob string, C<$input> must also be a fileglob
299 string. Anything else is an error.
303 If the C<$output> parameter is any other type, C<undef> will be returned.
307 When C<$input> maps to multiple compressed files/buffers and C<$output> is
308 a single file/buffer, after uncompression C<$output> will contain a
309 concatenation of all the uncompressed data from each of the input
312 =head2 Optional Parameters
314 Unless specified below, the optional parameters for C<bunzip2>,
315 C<OPTS>, are the same as those used with the OO interface defined in the
316 L</"Constructor Options"> section below.
320 =item C<< AutoClose => 0|1 >>
322 This option applies to any input or output data streams to
323 C<bunzip2> that are filehandles.
325 If C<AutoClose> is specified, and the value is true, it will result in all
326 input and/or output filehandles being closed once C<bunzip2> has
329 This parameter defaults to 0.
331 =item C<< BinModeOut => 0|1 >>
333 When writing to a file or filehandle, set C<binmode> before writing to the
338 =item C<< Append => 0|1 >>
342 =item C<< MultiStream => 0|1 >>
344 If the input file/buffer contains multiple compressed data streams, this
345 option will uncompress the whole lot as a single data stream.
349 =item C<< TrailingData => $scalar >>
351 Returns the data, if any, that is present immediately after the compressed
352 data stream once uncompression is complete.
354 This option can be used when there is useful information immediately
355 following the compressed data stream, and you don't know the length of the
356 compressed data stream.
358 If the input is a buffer, C<trailingData> will return everything from the
359 end of the compressed data stream to the end of the buffer.
361 If the input is a filehandle, C<trailingData> will return the data that is
362 left in the filehandle input buffer once the end of the compressed data
363 stream has been reached. You can then use the filehandle to read the rest
366 Don't bother using C<trailingData> if the input is a filename.
368 If you know the length of the compressed data stream before you start
369 uncompressing, you can avoid having to use C<trailingData> by setting the
370 C<InputLength> option.
376 To read the contents of the file C<file1.txt.bz2> and write the
377 compressed data to the file C<file1.txt>.
381 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
383 my $input = "file1.txt.bz2";
384 my $output = "file1.txt";
385 bunzip2 $input => $output
386 or die "bunzip2 failed: $Bunzip2Error\n";
388 To read from an existing Perl filehandle, C<$input>, and write the
389 uncompressed data to a buffer, C<$buffer>.
393 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
396 my $input = new IO::File "<file1.txt.bz2"
397 or die "Cannot open 'file1.txt.bz2': $!\n" ;
399 bunzip2 $input => \$buffer
400 or die "bunzip2 failed: $Bunzip2Error\n";
402 To uncompress all files in the directory "/my/home" that match "*.txt.bz2" and store the compressed data in the same directory
406 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
408 bunzip2 '</my/home/*.txt.bz2>' => '</my/home/#1.txt>'
409 or die "bunzip2 failed: $Bunzip2Error\n";
411 and if you want to compress each file one at a time, this will do the trick
415 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
417 for my $input ( glob "/my/home/*.txt.bz2" )
420 $output =~ s/.bz2// ;
421 bunzip2 $input => $output
422 or die "Error compressing '$input': $Bunzip2Error\n";
429 The format of the constructor for IO::Uncompress::Bunzip2 is shown below
431 my $z = new IO::Uncompress::Bunzip2 $input [OPTS]
432 or die "IO::Uncompress::Bunzip2 failed: $Bunzip2Error\n";
434 Returns an C<IO::Uncompress::Bunzip2> object on success and undef on failure.
435 The variable C<$Bunzip2Error> will contain an error message on failure.
437 If you are running Perl 5.005 or better the object, C<$z>, returned from
438 IO::Uncompress::Bunzip2 can be used exactly like an L<IO::File|IO::File> filehandle.
439 This means that all normal input file operations can be carried out with
440 C<$z>. For example, to read a line from a compressed file/buffer you can
441 use either of these forms
443 $line = $z->getline();
446 The mandatory parameter C<$input> is used to determine the source of the
447 compressed data. This parameter can take one of three forms.
453 If the C<$input> parameter is a scalar, it is assumed to be a filename. This
454 file will be opened for reading and the compressed data will be read from it.
458 If the C<$input> parameter is a filehandle, the compressed data will be
460 The string '-' can be used as an alias for standard input.
462 =item A scalar reference
464 If C<$input> is a scalar reference, the compressed data will be read from
469 =head2 Constructor Options
471 The option names defined below are case insensitive and can be optionally
472 prefixed by a '-'. So all of the following are valid
479 OPTS is a combination of the following options:
483 =item C<< AutoClose => 0|1 >>
485 This option is only valid when the C<$input> parameter is a filehandle. If
486 specified, and the value is true, it will result in the file being closed once
487 either the C<close> method is called or the IO::Uncompress::Bunzip2 object is
490 This parameter defaults to 0.
492 =item C<< MultiStream => 0|1 >>
494 Allows multiple concatenated compressed streams to be treated as a single
495 compressed stream. Decompression will stop once either the end of the
496 file/buffer is reached, an error is encountered (premature eof, corrupt
497 compressed data) or the end of a stream is not immediately followed by the
498 start of another stream.
500 This parameter defaults to 0.
502 =item C<< Prime => $string >>
504 This option will uncompress the contents of C<$string> before processing the
507 This option can be useful when the compressed data is embedded in another
508 file/data structure and it is not possible to work out where the compressed
509 data begins without having to read the first few bytes. If this is the
510 case, the uncompression can be I<primed> with these bytes using this
513 =item C<< Transparent => 0|1 >>
515 If this option is set and the input file/buffer is not compressed data,
516 the module will allow reading of it anyway.
518 In addition, if the input file/buffer does contain compressed data and
519 there is non-compressed data immediately following it, setting this option
520 will make this module treat the whole file/bufffer as a single data stream.
522 This option defaults to 1.
524 =item C<< BlockSize => $num >>
526 When reading the compressed input data, IO::Uncompress::Bunzip2 will read it in
527 blocks of C<$num> bytes.
529 This option defaults to 4096.
531 =item C<< InputLength => $size >>
533 When present this option will limit the number of compressed bytes read
534 from the input file/buffer to C<$size>. This option can be used in the
535 situation where there is useful data directly after the compressed data
536 stream and you know beforehand the exact length of the compressed data
539 This option is mostly used when reading from a filehandle, in which case
540 the file pointer will be left pointing to the first byte directly after the
541 compressed data stream.
543 This option defaults to off.
545 =item C<< Append => 0|1 >>
547 This option controls what the C<read> method does with uncompressed data.
549 If set to 1, all uncompressed data will be appended to the output parameter
550 of the C<read> method.
552 If set to 0, the contents of the output parameter of the C<read> method
553 will be overwritten by the uncompressed data.
557 =item C<< Strict => 0|1 >>
559 This option is a no-op.
561 =item C<< Small => 0|1 >>
563 When non-zero this options will make bzip2 use a decompression algorithm
564 that uses less memory at the expense of increasing the amount of time
565 taken for decompression.
581 $status = $z->read($buffer)
583 Reads a block of compressed data (the size the the compressed block is
584 determined by the C<Buffer> option in the constructor), uncompresses it and
585 writes any uncompressed data into C<$buffer>. If the C<Append> parameter is
586 set in the constructor, the uncompressed data will be appended to the
587 C<$buffer> parameter. Otherwise C<$buffer> will be overwritten.
589 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
590 or a negative number on error.
596 $status = $z->read($buffer, $length)
597 $status = $z->read($buffer, $length, $offset)
599 $status = read($z, $buffer, $length)
600 $status = read($z, $buffer, $length, $offset)
602 Attempt to read C<$length> bytes of uncompressed data into C<$buffer>.
604 The main difference between this form of the C<read> method and the
605 previous one, is that this one will attempt to return I<exactly> C<$length>
606 bytes. The only circumstances that this function will not is if end-of-file
607 or an IO error is encountered.
609 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
610 or a negative number on error.
616 $line = $z->getline()
621 This method fully supports the use of of the variable C<$/> (or
622 C<$INPUT_RECORD_SEPARATOR> or C<$RS> when C<English> is in use) to
623 determine what constitutes an end of line. Paragraph mode, record mode and
624 file slurp mode are all supported.
632 Read a single character.
638 $char = $z->ungetc($string)
644 $hdr = $z->getHeaderInfo();
645 @hdrs = $z->getHeaderInfo();
647 This method returns either a hash reference (in scalar context) or a list
648 or hash references (in array context) that contains information about each
649 of the header fields in the compressed data stream(s).
658 Returns the uncompressed file offset.
667 Returns true if the end of the compressed input stream has been reached.
671 $z->seek($position, $whence);
672 seek($z, $position, $whence);
674 Provides a sub-set of the C<seek> functionality, with the restriction
675 that it is only legal to seek forward in the input file/buffer.
676 It is a fatal error to attempt to seek backward.
678 The C<$whence> parameter takes one the usual values, namely SEEK_SET,
679 SEEK_CUR or SEEK_END.
681 Returns 1 on success, 0 on failure.
690 This is a noop provided for completeness.
696 Returns true if the object currently refers to a opened file/buffer.
700 my $prev = $z->autoflush()
701 my $prev = $z->autoflush(EXPR)
703 If the C<$z> object is associated with a file or a filehandle, this method
704 returns the current autoflush setting for the underlying filehandle. If
705 C<EXPR> is present, and is non-zero, it will enable flushing after every
706 write/print operation.
708 If C<$z> is associated with a buffer, this method has no effect and always
711 B<Note> that the special variable C<$|> B<cannot> be used to set or
712 retrieve the autoflush setting.
714 =head2 input_line_number
716 $z->input_line_number()
717 $z->input_line_number(EXPR)
719 Returns the current uncompressed line number. If C<EXPR> is present it has
720 the effect of setting the line number. Note that setting the line number
721 does not change the current position within the file/buffer being read.
723 The contents of C<$/> are used to to determine what constitutes a line
731 If the C<$z> object is associated with a file or a filehandle, C<fileno>
732 will return the underlying file descriptor. Once the C<close> method is
733 called C<fileno> will return C<undef>.
735 If the C<$z> object is is associated with a buffer, this method will return
743 Closes the output file/buffer.
745 For most versions of Perl this method will be automatically invoked if
746 the IO::Uncompress::Bunzip2 object is destroyed (either explicitly or by the
747 variable with the reference to the object going out of scope). The
748 exceptions are Perl versions 5.005 through 5.00504 and 5.8.0. In
749 these cases, the C<close> method will be called automatically, but
750 not until global destruction of all live objects when the program is
753 Therefore, if you want your scripts to be able to run on all versions
754 of Perl, you should call C<close> explicitly and not rely on automatic
757 Returns true on success, otherwise 0.
759 If the C<AutoClose> option has been enabled when the IO::Uncompress::Bunzip2
760 object was created, and the object is associated with a file, the
761 underlying file will also be closed.
767 my $status = $z->nextStream();
769 Skips to the next compressed data stream in the input file/buffer. If a new
770 compressed data stream is found, the eof marker will be cleared and C<$.>
773 Returns 1 if a new stream was found, 0 if none was found, and -1 if an
774 error was encountered.
780 my $data = $z->trailingData();
782 Returns the data, if any, that is present immediately after the compressed
783 data stream once uncompression is complete. It only makes sense to call
784 this method once the end of the compressed data stream has been
787 This option can be used when there is useful information immediately
788 following the compressed data stream, and you don't know the length of the
789 compressed data stream.
791 If the input is a buffer, C<trailingData> will return everything from the
792 end of the compressed data stream to the end of the buffer.
794 If the input is a filehandle, C<trailingData> will return the data that is
795 left in the filehandle input buffer once the end of the compressed data
796 stream has been reached. You can then use the filehandle to read the rest
799 Don't bother using C<trailingData> if the input is a filename.
801 If you know the length of the compressed data stream before you start
802 uncompressing, you can avoid having to use C<trailingData> by setting the
803 C<InputLength> option in the constructor.
807 No symbolic constants are required by this IO::Uncompress::Bunzip2 at present.
813 Imports C<bunzip2> and C<$Bunzip2Error>.
816 use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error) ;
822 =head2 Working with Net::FTP
824 See L<IO::Uncompress::Bunzip2::FAQ|IO::Uncompress::Bunzip2::FAQ/"Compressed files and Net::FTP">
828 L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress>
830 L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>
832 L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
833 L<Archive::Tar|Archive::Tar>,
836 The primary site for the bzip2 program is F<http://www.bzip.org>.
838 See the module L<Compress::Bzip2|Compress::Bzip2>
842 This module was written by Paul Marquess, F<pmqs@cpan.org>.
844 =head1 MODIFICATION HISTORY
846 See the Changes file.
848 =head1 COPYRIGHT AND LICENSE
850 Copyright (c) 2005-2008 Paul Marquess. All rights reserved.
852 This program is free software; you can redistribute it and/or
853 modify it under the same terms as Perl itself.