X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/19a1cd1676ad60324d19ebf733410411423892b4..3afb2f14ba09da7b54ce62a6f12d9703a7776666:/pod/perlfaq5.pod diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index ae71cd9..e2a9d98 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.31 $, $Date: 2004/02/07 04:29:50 $) +perlfaq5 - Files and Formats =head1 DESCRIPTION @@ -8,151 +8,466 @@ This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers. =head2 How do I flush/unbuffer an output filehandle? Why must I do this? +X X X X -Perl does not support truly unbuffered output (except -insofar as you can C), although it -does support is "command buffering", in which a physical -write is performed after every output command. +(contributed by brian d foy) -The C standard I/O library (stdio) normally buffers -characters sent to devices so that there isn't a system call -for each byte. In most stdio implementations, the type of -output buffering and the size of the buffer varies according -to the type of device. Perl's print() and write() functions -normally buffer output, while syswrite() bypasses buffering -all together. +You might like to read Mark Jason Dominus's "Suffering From Buffering" +at http://perl.plover.com/FAQs/Buffering.html . -If you want your output to be sent immediately when you -execute print() or write() (for instance, for some network -protocols), you must set the handle's autoflush flag. This -flag is the Perl variable $| and when it is set to a true -value, Perl will flush the handle's buffer after each -print() or write(). Setting $| affects buffering only for -the currently selected default file handle. You choose this -handle with the one argument select() call (see -L> and L). +Perl normally buffers output so it doesn't make a system call for every +bit of output. By saving up output, it makes fewer expensive system calls. +For instance, in this little bit of code, you want to print a dot to the +screen for every line you process to watch the progress of your program. +Instead of seeing a dot for every line, Perl buffers the output and you +have a long wait before you see a row of 50 dots all at once: -Use select() to choose the desired handle, then set its -per-filehandle variables. + # long wait, then row of dots all at once + while( <> ) { + print "."; + print "\n" unless ++$count % 50; - $old_fh = select(OUTPUT_HANDLE); - $| = 1; - select($old_fh); + #... expensive line processing operations + } -Some idioms can handle this in a single statement: +To get around this, you have to unbuffer the output filehandle, in this +case, C. You can set the special variable C<$|> to a true value +(mnemonic: making your filehandles "piping hot"): - select((select(OUTPUT_HANDLE), $| = 1)[0]); + $|++; - $| = 1, select $_ for select OUTPUT_HANDLE; + # dot shown immediately + while( <> ) { + print "."; + print "\n" unless ++$count % 50; -Some modules offer object-oriented access to handles and their -variables, although they may be overkill if this is the only -thing you do with them. You can use IO::Handle: + #... expensive line processing operations + } - use IO::Handle; - open(DEV, ">/dev/printer"); # but is this? - DEV->autoflush(1); +The C<$|> is one of the per-filehandle special variables, so each +filehandle has its own copy of its value. If you want to merge +standard output and standard error for instance, you have to unbuffer +each (although STDERR might be unbuffered by default): -or IO::Socket: + { + my $previous_default = select(STDOUT); # save previous default + $|++; # autoflush STDOUT + select(STDERR); + $|++; # autoflush STDERR, to be sure + select($previous_default); # restore previous default + } - use IO::Socket; # this one is kinda a pipe? - my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ; + # now should alternate . and + + while( 1 ) + { + sleep 1; + print STDOUT "."; + print STDERR "+"; + print STDOUT "\n" unless ++$count % 25; + } - $sock->autoflush(); +Besides the C<$|> special variable, you can use C to give +your filehandle a C<:unix> layer, which is unbuffered: -=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? + binmode( STDOUT, ":unix" ); -Use the Tie::File module, which is included in the standard -distribution since Perl 5.8.0. + while( 1 ) { + sleep 1; + print "."; + print "\n" unless ++$count % 50; + } -=head2 How do I count the number of lines in a file? +For more information on output layers, see the entries for C +and C in L, and the C module documentation. + +If you are using C or one of its subclasses, you can +call the C method to change the settings of the +filehandle: + + use IO::Handle; + open my( $io_fh ), ">", "output.txt"; + $io_fh->autoflush(1); + +The C objects also have a C method. You can flush +the buffer any time you want without auto-buffering + + $io_fh->flush; + +=head2 How do I change, delete, or insert a line in a file, or append to the beginning of a file? +X + +(contributed by brian d foy) + +The basic idea of inserting, changing, or deleting a line from a text +file involves reading and printing the file to the point you want to +make the change, making the change, then reading and printing the rest +of the file. Perl doesn't provide random access to lines (especially +since the record input separator, C<$/>, is mutable), although modules +such as C can fake it. + +A Perl program to do these tasks takes the basic form of opening a +file, printing its lines, then closing the file: + + open my $in, '<', $file or die "Can't read old file: $!"; + open my $out, '>', "$file.new" or die "Can't write new file: $!"; + + while( <$in> ) + { + print $out $_; + } + + close $out; + +Within that basic form, add the parts that you need to insert, change, +or delete lines. + +To prepend lines to the beginning, print those lines before you enter +the loop that prints the existing lines. + + open my $in, '<', $file or die "Can't read old file: $!"; + open my $out, '>', "$file.new" or die "Can't write new file: $!"; + + print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC + + while( <$in> ) + { + print $out $_; + } + + close $out; + +To change existing lines, insert the code to modify the lines inside +the C loop. In this case, the code finds all lowercased +versions of "perl" and uppercases them. The happens for every line, so +be sure that you're supposed to do that on every line! + + open my $in, '<', $file or die "Can't read old file: $!"; + open my $out, '>', "$file.new" or die "Can't write new file: $!"; + + print $out "# Add this line to the top\n"; + + while( <$in> ) + { + s/\b(perl)\b/Perl/g; + print $out $_; + } + + close $out; + +To change only a particular line, the input line number, C<$.>, is +useful. First read and print the lines up to the one you want to +change. Next, read the single line you want to change, change it, and +print it. After that, read the rest of the lines and print those: + + while( <$in> ) # print the lines before the change + { + print $out $_; + last if $. == 4; # line number before change + } + + my $line = <$in>; + $line =~ s/\b(perl)\b/Perl/g; + print $out $line; + + while( <$in> ) # print the rest of the lines + { + print $out $_; + } + +To skip lines, use the looping controls. The C in this example +skips comment lines, and the C stops all processing once it +encounters either C<__END__> or C<__DATA__>. + + while( <$in> ) + { + next if /^\s+#/; # skip comment lines + last if /^__(END|DATA)__$/; # stop at end of code marker + print $out $_; + } + +Do the same sort of thing to delete a particular line by using C +to skip the lines you don't want to show up in the output. This +example skips every fifth line: + + while( <$in> ) + { + next unless $. % 5; + print $out $_; + } -One fairly efficient way is to count newlines in the file. The -following program uses a feature of tr///, as documented in L. -If your text file doesn't end with a newline, then it's not really a -proper text file, so this may report one fewer line than you expect. +If, for some odd reason, you really want to see the whole file at once +rather than processing line-by-line, you can slurp it in (as long as +you can fit the whole thing in memory!): - $lines = 0; - open(FILE, $filename) or die "Can't open `$filename': $!"; - while (sysread FILE, $buffer, 4096) { - $lines += ($buffer =~ tr/\n//); - } - close FILE; + open my $in, '<', $file or die "Can't read old file: $!" + open my $out, '>', "$file.new" or die "Can't write new file: $!"; -This assumes no funny games with newline translations. + my @lines = do { local $/; <$in> }; # slurp! + + # do your magic here + + print $out @lines; + +Modules such as C and C can help with that +too. If you can, however, avoid reading the entire file at once. Perl +won't give that memory back to the operating system until the process +finishes. + +You can also use Perl one-liners to modify a file in-place. The +following changes all 'Fred' to 'Barney' in F, overwriting +the file with the new contents. With the C<-p> switch, Perl wraps a +C loop around the code you specify with C<-e>, and C<-i> turns +on in-place editing. The current line is in C<$_>. With C<-p>, Perl +automatically prints the value of C<$_> at the end of the loop. See +L for more details. + + perl -pi -e 's/Fred/Barney/' inFile.txt + +To make a backup of C, give C<-i> a file extension to add: + + perl -pi.bak -e 's/Fred/Barney/' inFile.txt + +To change only the fifth line, you can add a test checking C<$.>, the +input line number, then only perform the operation when the test +passes: + + perl -pi -e 's/Fred/Barney/ if $. == 5' inFile.txt + +To add lines before a certain line, you can add a line (or lines!) +before Perl prints C<$_>: + + perl -pi -e 'print "Put before third line\n" if $. == 3' inFile.txt + +You can even add a line to the beginning of a file, since the current +line prints at the end of the loop: + + perl -pi -e 'print "Put before first line\n" if $. == 1' inFile.txt + +To insert a line after one already in the file, use the C<-n> switch. +It's just like C<-p> except that it doesn't print C<$_> at the end of +the loop, so you have to do that yourself. In this case, print C<$_> +first, then print the line that you want to add. + + perl -ni -e 'print; print "Put after fifth line\n" if $. == 5' inFile.txt + +To delete lines, only print the ones that you want. + + perl -ni -e 'print unless /d/' inFile.txt + + ... or ... + + perl -pi -e 'next unless /d/' inFile.txt + +=head2 How do I count the number of lines in a file? +X X X + +(contributed by brian d foy) + +Conceptually, the easiest way to count the lines in a file is to +simply read them and count them: + + my $count = 0; + while( <$fh> ) { $count++; } + +You don't really have to count them yourself, though, since Perl +already does that with the C<$.> variable, which is the current line +number from the last filehandle read: + + 1 while( <$fh> ); + my $count = $.; + +If you want to use C<$.>, you can reduce it to a simple one-liner, +like one of these: + + % perl -lne '} print $.; {' file + + % perl -lne 'END { print $. }' file + +Those can be rather inefficient though. If they aren't fast enough for +you, you might just read chunks of data and count the number of +newlines: + + my $lines = 0; + open my($fh), '<:raw', $filename or die "Can't open $filename: $!"; + while( sysread $fh, $buffer, 4096 ) { + $lines += ( $buffer =~ tr/\n// ); + } + close FILE; + +However, that doesn't work if the line ending isn't a newline. You +might change that C to a C so you can count the number of +times the input record separator, C<$/>, shows up: + + my $lines = 0; + open my($fh), '<:raw', $filename or die "Can't open $filename: $!"; + while( sysread $fh, $buffer, 4096 ) { + $lines += ( $buffer =~ s|$/||g; ); + } + close FILE; + +If you don't mind shelling out, the C command is usually the +fastest, even with the extra interprocess overhead. Ensure that you +have an untainted filename though: + + #!perl -T + + $ENV{PATH} = undef; + + my $lines; + if( $filename =~ /^([0-9a-z_.]+)\z/ ) { + $lines = `/usr/bin/wc -l $1` + chomp $lines; + } + +=head2 How do I delete the last N lines from a file? +X X + +(contributed by brian d foy) + +The easiest conceptual solution is to count the lines in the +file then start at the beginning and print the number of lines +(minus the last N) to a new file. + +Most often, the real question is how you can delete the last N +lines without making more than one pass over the file, or how to +do it with a lot of copying. The easy concept is the hard reality when +you might have millions of lines in your file. + +One trick is to use C, which starts at the end of +the file. That module provides an object that wraps the real filehandle +to make it easy for you to move around the file. Once you get to the +spot you need, you can get the actual filehandle and work with it as +normal. In this case, you get the file position at the end of the last +line you want to keep and truncate the file to that point: + + use File::ReadBackwards; + + my $filename = 'test.txt'; + my $Lines_to_truncate = 2; + + my $bw = File::ReadBackwards->new( $filename ) + or die "Could not read backwards in [$filename]: $!"; + + my $lines_from_end = 0; + until( $bw->eof or $lines_from_end == $Lines_to_truncate ) + { + print "Got: ", $bw->readline; + $lines_from_end++; + } + + truncate( $filename, $bw->tell ); + +The C module also has the advantage of setting +the input record separator to a regular expression. + +You can also use the C module which lets you access +the lines through a tied array. You can use normal array operations +to modify your file, including setting the last index and using +C. =head2 How can I use Perl's C<-i> option from within a program? +X<-i> X C<-i> sets the value of Perl's C<$^I> variable, which in turn affects the behavior of C<< <> >>; see L for more details. By modifying the appropriate variables directly, you can get the same behavior within a larger program. For example: - # ... - { - local($^I, @ARGV) = ('.orig', glob("*.c")); - while (<>) { - if ($. == 1) { - print "This line should appear at the top of each file\n"; - } - s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case - print; - close ARGV if eof; # Reset $. - } - } - # $^I and @ARGV return to their old values here + # ... + { + local($^I, @ARGV) = ('.orig', glob("*.c")); + while (<>) { + if ($. == 1) { + print "This line should appear at the top of each file\n"; + } + s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case + print; + close ARGV if eof; # Reset $. + } + } + # $^I and @ARGV return to their old values here This block modifies all the C<.c> files in the current directory, leaving a backup of the original data from each file in a new C<.c.orig> file. +=head2 How can I copy a file? +X X X + +(contributed by brian d foy) + +Use the C module. It comes with Perl and can do a +true copy across file systems, and it does its magic in +a portable fashion. + + use File::Copy; + + copy( $original, $new_copy ) or die "Copy failed: $!"; + +If you can't use C, you'll have to do the work yourself: +open the original file, open the destination file, then print +to the destination file as you read the original. You also have to +remember to copy the permissions, owner, and group to the new file. + =head2 How do I make a temporary file name? +X -Use the File::Temp module, see L for more information. +If you don't need to know the name of the file, you can use C +with C in place of the file name. In Perl 5.8 or later, the +C function creates an anonymous temporary file: - use File::Temp qw/ tempfile tempdir /; + open my $tmp, '+>', undef or die $!; - $dir = tempdir( CLEANUP => 1 ); - ($fh, $filename) = tempfile( DIR => $dir ); +Otherwise, you can use the File::Temp module. - # or if you don't need to know the filename + use File::Temp qw/ tempfile tempdir /; - $fh = tempfile( DIR => $dir ); + $dir = tempdir( CLEANUP => 1 ); + ($fh, $filename) = tempfile( DIR => $dir ); + + # or if you don't need to know the filename + + $fh = tempfile( DIR => $dir ); The File::Temp has been a standard module since Perl 5.6.1. If you don't have a modern enough Perl installed, use the C class method from the IO::File module to get a filehandle opened for reading and writing. Use it if you don't need to know the file's name: - use IO::File; - $fh = IO::File->new_tmpfile() + use IO::File; + $fh = IO::File->new_tmpfile() or die "Unable to make new temporary file: $!"; If you're committed to creating a temporary file by hand, use the process ID and/or the current time-value. If you need to have many temporary files in one process, use a counter: - BEGIN { + BEGIN { use Fcntl; my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} || $ENV{TEMP}; - my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time()); + my $base_name = sprintf "%s/%d-%d-0000", $temp_dir, $$, time; + sub temp_file { - local *FH; - my $count = 0; - until (defined(fileno(FH)) || $count++ > 100) { - $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; - # O_EXCL is required for security reasons. - sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT); - } - if (defined(fileno(FH)) - return (*FH, $base_name); - } else { - return (); - } + local *FH; + my $count = 0; + until( defined(fileno(FH)) || $count++ > 100 ) { + $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; + # O_EXCL is required for security reasons. + sysopen FH, $base_name, O_WRONLY|O_EXCL|O_CREAT; + } + + if( defined fileno(FH) ) { + return (*FH, $base_name); + } + else { + return (); + } + } + } - } =head2 How can I manipulate fixed-record-length files? +X X The most efficient way is using L and L. This is faster than using @@ -163,20 +478,20 @@ Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, Berkeley-style ps: - # sample input line: - # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what - my $PS_T = 'A6 A4 A7 A5 A*'; - open my $ps, '-|', 'ps'; - print scalar <$ps>; - my @fields = qw( pid tt stat time command ); - while (<$ps>) { - my %process; - @process{@fields} = unpack($PS_T, $_); + # sample input line: + # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what + my $PS_T = 'A6 A4 A7 A5 A*'; + open my $ps, '-|', 'ps'; + print scalar <$ps>; + my @fields = qw( pid tt stat time command ); + while (<$ps>) { + my %process; + @process{@fields} = unpack($PS_T, $_); for my $field ( @fields ) { - print "$field: <$process{$field}>\n"; + print "$field: <$process{$field}>\n"; } print 'line=', pack($PS_T, @process{@fields} ), "\n"; - } + } We've used a hash slice in order to easily handle the fields of each row. Storing the keys in an array means it's easy to operate on them as a @@ -184,6 +499,7 @@ group or loop over them with for. It also avoids polluting the program with global variables and using symbolic references. =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? +X X X As of perl5.6, open() autovivifies file and directory handles as references if you pass it an uninitialized scalar variable. @@ -198,6 +514,18 @@ and use them in the place of named handles. process_file( $fh ); +If you like, you can store these filehandles in an array or a hash. +If you access them directly, they aren't simple scalars and you +need to give C a little help by placing the filehandle +reference in braces. Perl can only figure it out on its own when +the filehandle reference is a simple scalar. + + my @fhs = ( $fh1, $fh2, $fh3 ); + + for( $i = 0; $i <= $#fhs; $i++ ) { + print {$fhs[$i]} "just another Perl answer, \n"; + } + Before perl5.6, you had to deal with various typeglob idioms which you may see in older code. @@ -212,23 +540,24 @@ If you want to create many anonymous handles, you should check out the Symbol or IO::Handle modules. =head2 How can I use a filehandle indirectly? +X An indirect filehandle is using something other than a symbol in a place that a filehandle is expected. Here are ways to get indirect filehandles: - $fh = SOME_FH; # bareword is strict-subs hostile - $fh = "SOME_FH"; # strict-refs hostile; same package only - $fh = *SOME_FH; # typeglob - $fh = \*SOME_FH; # ref to typeglob (bless-able) - $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob + $fh = SOME_FH; # bareword is strict-subs hostile + $fh = "SOME_FH"; # strict-refs hostile; same package only + $fh = *SOME_FH; # typeglob + $fh = \*SOME_FH; # ref to typeglob (bless-able) + $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob Or, you can use the C method from one of the IO::* modules to create an anonymous filehandle, store that in a scalar variable, and use it as though it were a normal filehandle. - use IO::Handle; # 5.004 or higher - $fh = IO::Handle->new(); + use IO::Handle; # 5.004 or higher + $fh = IO::Handle->new(); Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an indirect filehandle may be used @@ -237,32 +566,32 @@ a filehandle. Functions like C, C, C, or the C<< >> diamond operator will accept either a named filehandle or a scalar variable containing one: - ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); - print $ofh "Type it: "; - $got = <$ifh> - print $efh "What was that: $got"; + ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); + print $ofh "Type it: "; + $got = <$ifh> + print $efh "What was that: $got"; If you're passing a filehandle to a function, you can write the function in two ways: - sub accept_fh { - my $fh = shift; - print $fh "Sending to indirect filehandle\n"; - } + sub accept_fh { + my $fh = shift; + print $fh "Sending to indirect filehandle\n"; + } Or it can localize a typeglob and use the filehandle directly: - sub accept_fh { - local *FH = shift; - print FH "Sending to localized filehandle\n"; - } + sub accept_fh { + local *FH = shift; + print FH "Sending to localized filehandle\n"; + } Both styles work with either objects or typeglobs of real filehandles. (They might also work with strings under some circumstances, but this is risky.) - accept_fh(*STDOUT); - accept_fh($handle); + accept_fh(*STDOUT); + accept_fh($handle); In the examples above, we assigned the filehandle to a scalar variable before using it. That is because only simple scalar variables, not @@ -271,24 +600,24 @@ built-ins like C, C, or the diamond operator. Using something other than a simple scalar variable as a filehandle is illegal and won't even compile: - @fd = (*STDIN, *STDOUT, *STDERR); - print $fd[1] "Type it: "; # WRONG - $got = <$fd[0]> # WRONG - print $fd[2] "What was that: $got"; # WRONG + @fd = (*STDIN, *STDOUT, *STDERR); + print $fd[1] "Type it: "; # WRONG + $got = <$fd[0]> # WRONG + print $fd[2] "What was that: $got"; # WRONG With C and C, you get around this by using a block and an expression where you would place the filehandle: - print { $fd[1] } "funny stuff\n"; - printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559; - # Pity the poor deadbeef. + print { $fd[1] } "funny stuff\n"; + printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559; + # Pity the poor deadbeef. That block is a proper block like any other, so you can put more complicated code there. This sends the message out to one of two places: - $ok = -x "/bin/cat"; - print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; - print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; + $ok = -x "/bin/cat"; + print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; + print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; This approach of treating C and C like object methods calls doesn't work for the diamond operator. That's because it's a @@ -299,7 +628,7 @@ as C<< <> >> does. Given the initialization shown above for @fd, this would work, but only because readline() requires a typeglob. It doesn't work with objects or strings, which might be a bug we haven't fixed yet. - $got = readline($fd[0]); + $got = readline($fd[0]); Let it be noted that the flakiness of indirect filehandles is not related to whether they're strings, typeglobs, objects, or anything else. @@ -307,49 +636,79 @@ It's the syntax of the fundamental operators. Playing the object game doesn't help you at all here. =head2 How can I set up a footer format to be used with write()? +X