X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/19799a22062ef658e4ac543ea06fa9193323512a..4ee2b8db537d28b77d127a86307e426289e5c8b5:/pod/perlopentut.pod diff --git a/pod/perlopentut.pod b/pod/perlopentut.pod index ae622a6..b83e14a 100644 --- a/pod/perlopentut.pod +++ b/pod/perlopentut.pod @@ -1,862 +1,278 @@ +=encoding utf8 + =head1 NAME -perlopentut - tutorial on opening things in Perl +perlopentut - simple recipes for opening files and pipes in Perl =head1 DESCRIPTION -Perl has two simple, built-in ways to open files: the shell way for -convenience, and the C way for precision. The choice is yours. - -=head1 Open E la shell - -Perl's C function was designed to mimic the way command-line -redirection in the shell works. Here are some basic examples -from the shell: - - $ myprogram file1 file2 file3 - $ myprogram < inputfile - $ myprogram > outputfile - $ myprogram >> outputfile - $ myprogram | otherprogram - $ otherprogram | myprogram - -And here are some more advanced examples: - - $ otherprogram | myprogram f1 - f2 - $ otherprogram 2>&1 | myprogram - - $ myprogram <&3 - $ myprogram >&4 - -Programmers accustomed to constructs like those above can take comfort -in learning that Perl directly supports these familiar constructs using -virtually the same syntax as the shell. - -=head2 Simple Opens - -The C function takes two arguments: the first is a filehandle, -and the second is a single string comprising both what to open and how -to open it. C returns true when it works, and when it fails, -returns a false value and sets the special variable $! to reflect -the system error. If the filehandle was previously opened, it will -be implicitly closed first. - -For example: - - open(INFO, "datafile") || die("can't open datafile: $!"); - open(INFO, "< datafile") || die("can't open datafile: $!"); - open(RESULTS,"> runstats") || die("can't open runstats: $!"); - open(LOG, ">> logfile ") || die("can't open logfile: $!"); - -If you prefer the low-punctuation version, you could write that this way: - - open INFO, "< datafile" or die "can't open datafile: $!"; - open RESULTS,"> runstats" or die "can't open runstats: $!"; - open LOG, ">> logfile " or die "can't open logfile: $!"; - -A few things to notice. First, the leading less-than is optional. -If omitted, Perl assumes that you want to open the file for reading. - -The other important thing to notice is that, just as in the shell, -any white space before or after the filename is ignored. This is good, -because you wouldn't want these to do different things: - - open INFO, "; # oops, \n still there - open(EXTRA, "< $filename") || die "can't open $filename: $!"; - -This is not a bug, but a feature. Because C mimics the shell in -its style of using redirection arrows to specify how to open the file, it -also does so with respect to extra white space around the filename itself -as well. For accessing files with naughty names, see L<"Dispelling -the Dweomer">. - -=head2 Pipe Opens - -In C, when you want to open a file using the standard I/O library, -you use the C function, but when opening a pipe, you use the -C function. But in the shell, you just use a different redirection -character. That's also the case for Perl. The C call -remains the same--just its argument differs. - -If the leading character is a pipe symbol, C) { } # do something with input - close(NET) || die "can't close netstat: $!"; - -What happens if you try to open a pipe to or from a non-existent command? -In most systems, such an C will not return an error. That's -because in the traditional C/C model, running the other -program happens only in the forked child process, which means that -the failed C can't be reflected in the return value of C. -Only a failed C shows up there. See L to see how to cope with this. -There's also an explanation in L. - -If you would like to open a bidirectional pipe, the IPC::Open2 -library will handle this for you. Check out L - -=head2 The Minus File - -Again following the lead of the standard shell utilities, Perl's -C function treats a file whose name is a single minus, "-", in a -special way. If you open minus for reading, it really means to access -the standard input. If you open minus for writing, it really means to -access the standard output. - -If minus can be used as the default input or default output? What happens -if you open a pipe into or out of minus? What's the default command it -would run? The same script as you're current running! This is actually -a stealth C hidden inside an C call. See L for details. - -=head2 Mixing Reads and Writes - -It is possible to specify both read and write access. All you do is -add a "+" symbol in front of the redirection. But as in the shell, -using a less-than on a file never creates a new file; it only opens an -existing one. On the other hand, using a greater-than always clobbers -(truncates to zero length) an existing file, or creates a brand-new one -if there isn't an old one. Adding a "+" for read-write doesn't affect -whether it only works on existing files or always clobbers existing ones. - - open(WTMP, "+< /usr/adm/wtmp") - || die "can't open /usr/adm/wtmp: $!"; - - open(SCREEN, "+> /tmp/lkscreen") - || die "can't open /tmp/lkscreen: $!"; - - open(LOGFILE, "+>> /tmp/applog" - || die "can't open /tmp/applog: $!"; - -The first one won't create a new file, and the second one will always -clobber an old one. The third one will create a new file if necessary -and not clobber an old one, and it will allow you to read at any point -in the file, but all writes will always go to the end. In short, -the first case is substantially more common than the second and third -cases, which are almost always wrong. (If you know C, the plus in -Perl's C is historically derived from the one in C's fopen(3S), -which it ultimately calls.) - -In fact, when it comes to updating a file, unless you're working on -a binary file as in the WTMP case above, you probably don't want to -use this approach for updating. Instead, Perl's B<-i> flag comes to -the rescue. The following command takes all the C, C++, or yacc source -or header files and changes all their foo's to bar's, leaving -the old version in the original file name with a ".orig" tacked -on the end: - - $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy] - -This is a short cut for some renaming games that are really -the best way to update textfiles. See the second question in -L for more details. - -=head2 Filters - -One of the most common uses for C is one you never -even notice. When you process the ARGV filehandle using -CARGVE>, Perl actually does an implicit open -on each file in @ARGV. Thus a program called like this: - - $ myprogram file1 file2 file3 +Whenever you do I/O on a file in Perl, you do so through what in Perl is +called a B. A filehandle is an internal name for an external +file. It is the job of the C function to make the association +between the internal name and the external name, and it is the job +of the C function to break that association. -Can have all its files opened and processed one at a time -using a construct no more complex than: +For your convenience, Perl sets up a few special filehandles that are +already open when you run. These include C, C, C, +and C. Since those are pre-opened, you can use them right away +without having to go to the trouble of opening them yourself: - while (<>) { - # do something with $_ - } + print STDERR "This is a debugging message.\n"; -If @ARGV is empty when the loop first begins, Perl pretends you've opened -up minus, that is, the standard input. In fact, $ARGV, the currently -open file during CARGVE> processing, is even set to "-" -in these circumstances. + print STDOUT "Please enter something: "; + $response = // die "how come no input?"; + print STDOUT "Thank you!\n"; -You are welcome to pre-process your @ARGV before starting the loop to -make sure it's to your liking. One reason to do this might be to remove -command options beginning with a minus. While you can always roll the -simple ones by hand, the Getopts modules are good for this. + while () { ... } - use Getopt::Std; +As you see from those examples, C and C are output +handles, and C and C are input handles. They are +in all capital letters because they are reserved to Perl, much +like the C<@ARGV> array and the C<%ENV> hash are. Their external +associations were set up by your shell. - # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o - getopts("vDo:"); +You will need to open every other filehandle on your own. Although there +are many variants, the most common way to call Perl's open() function +is with three arguments and one return value: - # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o} - getopts("vDo:", \%args); +C< I = open(I, I, I)> -Or the standard Getopt::Long module to permit named arguments: +Where: - use Getopt::Long; - GetOptions( "verbose" => \$verbose, # --verbose - "Debug" => \$debug, # --Debug - "output=s" => \$output ); - # --output=somestring or --output somestring +=over -Another reason for preprocessing arguments is to make an empty -argument list default to all files: +=item I - @ARGV = glob("*") unless @ARGV; +will be some defined value if the open succeeds, but +C if it fails; -You could even filter out all but plain, text files. This is a bit -silent, of course, and you might prefer to mention them on the way. +=item I - @ARGV = grep { -f && -T } @ARGV; +should be an undefined scalar variable to be filled in by the +C function if it succeeds; -If you're using the B<-n> or B<-p> command-line options, you -should put changes to @ARGV in a C block. +=item I -Remember that a normal C has special properties, in that it might -call fopen(3S) or it might called popen(3S), depending on what its -argument looks like; that's why it's sometimes called "magic open". -Here's an example: +is the access mode and the encoding format to open the file with; - $pwdinfo = `domainname` =~ /^(\(none\))?$/ - ? '< /etc/passwd' - : 'ypcat passwd |'; +=item I - open(PWD, $pwdinfo) - or die "can't open $pwdinfo: $!"; +is the external name of the file you want opened. -This sort of thing also comes into play in filter processing. Because -CARGVE> processing employs the normal, shell-style Perl C, -it respects all the special things we've already seen: +=back - $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile +Most of the complexity of the C function lies in the many +possible values that the I parameter can take on. -That program will read from the file F, the process F, standard -input (F in this case), the F file, the F command, -and finally the F file. +One last thing before we show you how to open files: opening +files does not (usually) automatically lock them in Perl. See +L for how to lock. -Yes, this also means that if you have a file named "-" (and so on) in -your directory, that they won't be processed as literal files by C. -You'll need to pass them as "./-" much as you would for the I program. -Or you could use C as described below. +=head1 Opening Text Files -One of the more interesting applications is to change files of a certain -name into pipes. For example, to autoprocess gzipped or compressed -files by decompressing them with I: +=head2 Opening Text Files for Reading - @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV; +If you want to read from a text file, first open it in +read-only mode like this: -Or, if you have the I program installed from LWP, -you can fetch URLs before processing them: + my $filename = "/some/path/to/a/textfile/goes/here"; + my $encoding = ":encoding(UTF-8)"; + my $handle = undef; # this will be filled in on success - @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV; + open($handle, "< $encoding", $filename) + || die "$0: can't open $filename for reading: $!"; -It's not for nothing that this is called magic CARGVE>. -Pretty nifty, eh? +As with the shell, in Perl the C<< "<" >> is used to open the file in +read-only mode. If it succeeds, Perl allocates a brand new filehandle for +you and fills in your previously undefined C<$handle> argument with a +reference to that handle. -=head1 Open E la C +Now you may use functions like C, C, C, and +C on that handle. Probably the most common input function +is the one that looks like an operator: -If you want the convenience of the shell, then Perl's C is -definitely the way to go. On the other hand, if you want finer precision -than C's simplistic fopen(3S) provides, then you should look to Perl's -C, which is a direct hook into the open(2) system call. -That does mean it's a bit more involved, but that's the price of -precision. + $line = readline($handle); + $line = <$handle>; # same thing -C takes 3 (or 4) arguments. +Because the C function returns C at end of file or +upon error, you will sometimes see it used this way: - sysopen HANDLE, PATH, FLAGS, [MASK] - -The HANDLE argument is a filehandle just as with C. The PATH is -a literal path, one that doesn't pay attention to any greater-thans or -less-thans or pipes or minuses, nor ignore white space. If it's there, -it's part of the path. The FLAGS argument contains one or more values -derived from the Fcntl module that have been or'd together using the -bitwise "|" operator. The final argument, the MASK, is optional; if -present, it is combined with the user's current umask for the creation -mode of the file. You should usually omit this. - -Although the traditional values of read-only, write-only, and read-write -are 0, 1, and 2 respectively, this is known not to hold true on some -systems. Instead, it's best to load in the appropriate constants first -from the Fcntl module, which supplies the following standard flags: - - O_RDONLY Read only - O_WRONLY Write only - O_RDWR Read and write - O_CREAT Create the file if it doesn't exist - O_EXCL Fail if the file already exists - O_APPEND Append to the file - O_TRUNC Truncate the file - O_NONBLOCK Non-blocking access - -Less common flags that are sometimes available on some operating systems -include C, C, C, C, C, -C, C, C, C, C, C -and C. Consult your open(2) manpage or its local equivalent -for details. - -Here's how to use C to emulate the simple C calls we had -before. We'll omit the C<|| die $!> checks for clarity, but make sure -you always check the return values in real code. These aren't quite -the same, since C will trim leading and trailing white space, -but you'll get the idea: - -To open a file for reading: - - open(FH, "< $path"); - sysopen(FH, $path, O_RDONLY); - -To open a file for writing, creating a new file if needed or else truncating -an old file: - - open(FH, "> $path"); - sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT); - -To open a file for appending, creating one if necessary: - - open(FH, ">> $path"); - sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT); - -To open a file for update, where the file must already exist: - - open(FH, "+< $path"); - sysopen(FH, $path, O_RDWR); - -And here are things you can do with C that you cannot do with -a regular C. As you see, it's just a matter of controlling the -flags in the third argument. - -To open a file for writing, creating a new file which must not previously -exist: + $line = <$handle>; + if (defined $line) { + # do something with $line + } + else { + # $line is not valid, so skip it + } - sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT); +You can also just quickly C on an undefined value this way: -To open a file for appending, where that file must already exist: + $line = <$handle> // die "no input found"; - sysopen(FH, $path, O_WRONLY | O_APPEND); +However, if hitting EOF is an expected and normal event, you do not want to +exit simply because you have run out of input. Instead, you probably just want +to exit an input loop. You can then test to see if an actual error has caused +the loop to terminate, and act accordingly: -To open a file for update, creating a new file if necessary: + while (<$handle>) { + # do something with data in $_ + } + if ($!) { + die "unexpected error while reading from $filename: $!"; + } - sysopen(FH, $path, O_RDWR | O_CREAT); +B: Having to specify the text encoding every time +might seem a bit of a bother. To set up a default encoding for C so +that you don't have to supply it each time, you can use the C pragma: -To open a file for update, where that file must not already exist: + use open qw< :encoding(UTF-8) >; - sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT); +Once you've done that, you can safely omit the encoding part of the +open mode: -To open a file without blocking, creating one if necessary: + open($handle, "<", $filename) + || die "$0: can't open $filename for reading: $!"; - sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT); +But never use the bare C<< "<" >> without having set up a default encoding +first. Otherwise, Perl cannot know which of the many, many, many possible +flavors of text file you have, and Perl will have no idea how to correctly +map the data in your file into actual characters it can work with. Other +common encoding formats including C<"ASCII">, C<"ISO-8859-1">, +C<"ISO-8859-15">, C<"Windows-1252">, C<"MacRoman">, and even C<"UTF-16LE">. +See L for more about encodings. -=head2 Permissions E la mode +=head2 Opening Text Files for Writing -If you omit the MASK argument to C, Perl uses the octal value -0666. The normal MASK to use for executables and directories should -be 0777, and for anything else, 0666. +When you want to write to a file, you first have to decide what to do about +any existing contents of that file. You have two basic choices here: to +preserve or to clobber. -Why so permissive? Well, it isn't really. The MASK will be modified -by your process's current C. A umask is a number representing -I permissions bits; that is, bits that will not be turned on -in the created files' permissions field. +If you want to preserve any existing contents, then you want to open the file +in append mode. As in the shell, in Perl you use C<<< ">>" >>> to open an +existing file in append mode. C<<< ">>" >>> creates the file if it does not +already exist. -For example, if your C were 027, then the 020 part would -disable the group from writing, and the 007 part would disable others -from reading, writing, or executing. Under these conditions, passing -C 0666 would create a file with mode 0640, since C<0666 &~ 027> -is 0640. + my $handle = undef; + my $filename = "/some/path/to/a/textfile/goes/here"; + my $encoding = ":encoding(UTF-8)"; -You should seldom use the MASK argument to C. That takes -away the user's freedom to choose what permission new files will have. -Denying choice is almost always a bad thing. One exception would be for -cases where sensitive or private data is being stored, such as with mail -folders, cookie files, and internal temporary files. - -=head1 Obscure Open Tricks + open($handle, ">> $encoding", $filename) + || die "$0: can't open $filename for appending: $!"; -=head2 Re-Opening Files (dups) +Now you can write to that filehandle using any of C, C, +C, C, or C. -Sometimes you already have a filehandle open, and want to make another -handle that's a duplicate of the first one. In the shell, we place an -ampersand in front of a file descriptor number when doing redirections. -For example, C<2E&1> makes descriptor 2 (that's STDERR in Perl) -be redirected into descriptor 1 (which is usually Perl's STDOUT). -The same is essentially true in Perl: a filename that begins with an -ampersand is treated instead as a file descriptor if a number, or as a -filehandle if a string. +As noted above, if the file does not already exist, then the append-mode open +will create it for you. But if the file does already exist, its contents are +safe from harm because you will be adding your new text past the end of the +old text. - open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!"; - open(MHCONTEXT, "<&4") || die "couldn't dup fd4: $!"; +On the other hand, sometimes you want to clobber whatever might already be +there. To empty out a file before you start writing to it, you can open it +in write-only mode: -That means that if a function is expecting a filename, but you don't -want to give it a filename because you already have the file open, you -can just pass the filehandle with a leading ampersand. It's best to -use a fully qualified handle though, just in case the function happens -to be in a different package: + my $handle = undef; + my $filename = "/some/path/to/a/textfile/goes/here"; + my $encoding = ":encoding(UTF-8)"; - somefunction("&main::LOGFILE"); + open($handle, "> $encoding", $filename) + || die "$0: can't open $filename in write-open mode: $!"; -This way if somefunction() is planning on opening its argument, it can -just use the already opened handle. This differs from passing a handle, -because with a handle, you don't open the file. Here you have something -you can pass to open. +Here again Perl works just like the shell in that the C<< ">" >> clobbers +an existing file. -If you have one of those tricky, newfangled I/O objects that the C++ -folks are raving about, then this doesn't work because those aren't a -proper filehandle in the native Perl sense. You'll have to use fileno() -to pull out the proper descriptor number, assuming you can: +As with the append mode, when you open a file in write-only mode, +you can now write to that filehandle using any of C, C, +C, C, or C. - use IO::Socket; - $handle = IO::Socket::INET->new("www.perl.com:80"); - $fd = $handle->fileno; - somefunction("&$fd"); # not an indirect function call +What about read-write mode? You should probably pretend it doesn't exist, +because opening text files in read-write mode is unlikely to do what you +would like. See L for details. -It can be easier (and certainly will be faster) just to use real -filehandles though: +=head1 Opening Binary Files - use IO::Socket; - local *REMOTE = IO::Socket::INET->new("www.perl.com:80"); - die "can't connect" unless defined(fileno(REMOTE)); - somefunction("&main::REMOTE"); +If the file to be opened contains binary data instead of text characters, +then the C argument to C is a little different. Instead of +specifying the encoding, you tell Perl that your data are in raw bytes. -If the filehandle or descriptor number is preceded not just with a simple -"&" but rather with a "&=" combination, then Perl will not create a -completely new descriptor opened to the same place using the dup(2) -system call. Instead, it will just make something of an alias to the -existing one using the fdopen(3S) library call This is slightly more -parsimonious of systems resources, although this is less a concern -these days. Here's an example of that: + my $filename = "/some/path/to/a/binary/file/goes/here"; + my $encoding = ":raw :bytes" + my $handle = undef; # this will be filled in on success - $fd = $ENV{"MHCONTEXTFD"}; - open(MHCONTEXT, "<&=$fd") or die "couldn't fdopen $fd: $!"; +And then open as before, choosing C<<< "<" >>>, C<<< ">>" >>>, or +C<<< ">" >>> as needed: -If you're using magic CARGVE>, you could even pass in as a -command line argument in @ARGV something like C<"E&=$MHCONTEXTFD">, -but we've never seen anyone actually do this. + open($handle, "< $encoding", $filename) + || die "$0: can't open $filename for reading: $!"; -=head2 Dispelling the Dweomer + open($handle, ">> $encoding", $filename) + || die "$0: can't open $filename for appending: $!"; -Perl is more of a DWIMmer language than something like Java--where DWIM -is an acronym for "do what I mean". But this principle sometimes leads -to more hidden magic than one knows what to do with. In this way, Perl -is also filled with I, an obscure word meaning an enchantment. -Sometimes, Perl's DWIMmer is just too much like dweomer for comfort. + open($handle, "> $encoding", $filename) + || die "$0: can't open $filename in write-open mode: $!"; -If magic C is a bit too magical for you, you don't have to turn -to C. To open a file with arbitrary weird characters in -it, it's necessary to protect any leading and trailing whitespace. -Leading whitespace is protected by inserting a C<"./"> in front of a -filename that starts with whitespace. Trailing whitespace is protected -by appending an ASCII NUL byte (C<"\0">) at the end off the string. +Alternately, you can change to binary mode on an existing handle this way: - $file =~ s#^(\s)#./$1#; - open(FH, "< $file\0") || die "can't open $file: $!"; - -This assumes, of course, that your system considers dot the current -working directory, slash the directory separator, and disallows ASCII -NULs within a valid filename. Most systems follow these conventions, -including all POSIX systems as well as proprietary Microsoft systems. -The only vaguely popular system that doesn't work this way is the -proprietary Macintosh system, which uses a colon where the rest of us -use a slash. Maybe C isn't such a bad idea after all. - -If you want to use CARGVE> processing in a totally boring -and non-magical way, you could do this first: - - # "Sam sat on the ground and put his head in his hands. - # 'I wish I had never come here, and I don't want to see - # no more magic,' he said, and fell silent." - for (@ARGV) { - s#^([^./])#./$1#; - $_ .= "\0"; - } - while (<>) { - # now process $_ - } - -But be warned that users will not appreciate being unable to use "-" -to mean standard input, per the standard convention. - -=head2 Paths as Opens - -You've probably noticed how Perl's C and C functions can -produce messages like: - - Some warning at scriptname line 29, chunk 7. + binmode($handle) || die "cannot binmode handle"; -That's because you opened a filehandle FH, and had read in seven records -from it. But what was the name of the file, not the handle? - -If you aren't running with C, or if you've turn them off -temporarily, then all you have to do is this: - - open($path, "< $path") || die "can't open $path: $!"; - while (<$path>) { - # whatever - } - -Since you're using the pathname of the file as its handle, -you'll get warnings more like - - Some warning at scriptname line 29, chunk 7. - -=head2 Single Argument Open - -Remember how we said that Perl's open took two arguments? That was a -passive prevarication. You see, it can also take just one argument. -If and only if the variable is a global variable, not a lexical, you -can pass C just one argument, the filehandle, and it will -get the path from the global scalar variable of the same name. - - $FILE = "/etc/motd"; - open FILE or die "can't open $FILE: $!"; - while () { - # whatever - } - -Why is this here? Someone has to cater to the hysterical porpoises. -It's something that's been in Perl since the very beginning, if not -before. - -=head2 Playing with STDIN and STDOUT - -One clever move with STDOUT is to explicitly close it when you're done -with the program. - - END { close(STDOUT) || die "can't close stdout: $!" } +This is especially handy for the handles that Perl has already opened for you. -If you don't do this, and your program fills up the disk partition due -to a command line redirection, it won't report the error exit with a -failure status. + binmode(STDIN) || die "cannot binmode STDIN"; + binmode(STDOUT) || die "cannot binmode STDOUT"; -You don't have to accept the STDIN and STDOUT you were given. You are -welcome to reopen them if you'd like. +You can also pass C an explicit encoding to change it on the fly. +This isn't exactly "binary" mode, but we still use C to do it: - open(STDIN, "< datafile") - || die "can't open datafile: $!"; - - open(STDOUT, "> output") - || die "can't open output: $!"; - -And then these can be read directly or passed on to subprocesses. -This makes it look as though the program were initially invoked -with those redirections from the command line. + binmode(STDIN, ":encoding(MacRoman)") || die "cannot binmode STDIN"; + binmode(STDOUT, ":encoding(UTF-8)") || die "cannot binmode STDOUT"; -It's probably more interesting to connect these to pipes. For example: +Once you have your binary file properly opened in the right mode, you can +use all the same Perl I/O functions as you used on text files. However, +you may wish to use the fixed-size C instead of the variable-sized +C for your input. - $pager = $ENV{PAGER} || "(less || more)"; - open(STDOUT, "| $pager") - || die "can't fork a pager: $!"; +Here's an example of how to copy a binary file: -This makes it appear as though your program were called with its stdout -already piped into your pager. You can also use this kind of thing -in conjunction with an implicit fork to yourself. You might do this -if you would rather handle the post processing in your own program, -just in a different process: + my $BUFSIZ = 64 * (2 ** 10); + my $name_in = "/some/input/file"; + my $name_out = "/some/output/flie"; - head(100); - while (<>) { - print; - } + my($in_fh, $out_fh, $buffer); - sub head { - my $lines = shift || 20; - return unless $pid = open(STDOUT, "|-"); - die "cannot fork: $!" unless defined $pid; - while () { - print; - last if --$lines < 0; - } - exit; - } + open($in_fh, "<", $name_in) + || die "$0: cannot open $name_in for reading: $!"; + open($out_fh, ">", $name_out) + || die "$0: cannot open $name_out for writing: $!"; -This technique can be applied to repeatedly push as many filters on your -output stream as you wish. + for my $fh ($in_fh, $out_fh) { + binmode($fh) || die "binmode failed"; + } -=head1 Other I/O Issues + while (read($in_fh, $buffer, $BUFSIZ)) { + unless (print $out_fh $buffer) { + die "couldn't write to $name_out: $!"; + } + } -These topics aren't really arguments related to C or C, -but they do affect what you do with your open files. + close($in_fh) || die "couldn't close $name_in: $!"; + close($out_fh) || die "couldn't close $name_out: $!"; -=head2 Opening Non-File Files +=head1 Opening Pipes -When is a file not a file? Well, you could say when it exists but -isn't a plain file. We'll check whether it's a symbolic link first, -just in case. +To be announced. - if (-l $file || ! -f _) { - print "$file is not a plain file\n"; - } +=head1 Low-level File Opens via sysopen -What other kinds of files are there than, well, files? Directories, -symbolic links, named pipes, Unix-domain sockets, and block and character -devices. Those are all files, too--just not I files. This isn't -the same issue as being a text file. Not all text files are plain files. -Not all plain files are textfiles. That's why there are separate C<-f> -and C<-T> file tests. +To be announced. Or deleted. -To open a directory, you should use the C function, then -process it with C, carefully restoring the directory -name if necessary: +=head1 SEE ALSO - opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; - while (defined($file = readdir(DIR))) { - # do something with "$dirname/$file" - } - closedir(DIR); - -If you want to process directories recursively, it's better to use the -File::Find module. For example, this prints out all files recursively, -add adds a slash to their names if the file is a directory. - - @ARGV = qw(.) unless @ARGV; - use File::Find; - find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV; - -This finds all bogus symbolic links beneath a particular directory: - - find sub { print "$File::Find::name\n" if -l && !-e }, $dir; - -As you see, with symbolic links, you can just pretend that it is -what it points to. Or, if you want to know I it points to, then -C is called for: - - if (-l $file) { - if (defined($whither = readlink($file))) { - print "$file points to $whither\n"; - } else { - print "$file points nowhere: $!\n"; - } - } - -Named pipes are a different matter. You pretend they're regular files, -but their opens will normally block until there is both a reader and -a writer. You can read more about them in L. -Unix-domain sockets are rather different beasts as well; they're -described in L. - -When it comes to opening devices, it can be easy and it can tricky. -We'll assume that if you're opening up a block device, you know what -you're doing. The character devices are more interesting. These are -typically used for modems, mice, and some kinds of printers. This is -described in L -It's often enough to open them carefully: - - sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY) - # (O_NOCTTY no longer needed on POSIX systems) - or die "can't open /dev/ttyS1: $!"; - open(TTYOUT, "+>&TTYIN") - or die "can't dup TTYIN: $!"; - - $ofh = select(TTYOUT); $| = 1; select($ofh); - - print TTYOUT "+++at\015"; - $answer = ; - -With descriptors that you haven't opened using C, such as a -socket, you can set them to be non-blocking using C: - - use Fcntl; - fcntl(Connection, F_SETFL, O_NONBLOCK) - or die "can't set non blocking: $!"; - -Rather than losing yourself in a morass of twisting, turning Cs, -all dissimilar, if you're going to manipulate ttys, it's best to -make calls out to the stty(1) program if you have it, or else use the -portable POSIX interface. To figure this all out, you'll need to read the -termios(3) manpage, which describes the POSIX interface to tty devices, -and then L, which describes Perl's interface to POSIX. There are -also some high-level modules on CPAN that can help you with these games. -Check out Term::ReadKey and Term::ReadLine. - -What else can you open? To open a connection using sockets, you won't use -one of Perl's two open functions. See L for that. Here's an example. Once you have it, -you can use FH as a bidirectional filehandle. - - use IO::Socket; - local *FH = IO::Socket::INET->new("www.perl.com:80"); - -For opening up a URL, the LWP modules from CPAN are just what -the doctor ordered. There's no filehandle interface, but -it's still easy to get the contents of a document: - - use LWP::Simple; - $doc = get('http://www.sn.no/libwww-perl/'); - -=head2 Binary Files - -On certain legacy systems with what could charitably be called terminally -convoluted (some would say broken) I/O models, a file isn't a file--at -least, not with respect to the C standard I/O library. On these old -systems whose libraries (but not kernels) distinguish between text and -binary streams, to get files to behave properly you'll have to bend over -backwards to avoid nasty problems. On such infelicitous systems, sockets -and pipes are already opened in binary mode, and there is currently no -way to turn that off. With files, you have more options. - -Another option is to use the C function on the appropriate -handles before doing regular I/O on them: - - binmode(STDIN); - binmode(STDOUT); - while () { print } - -Passing C a non-standard flag option will also open the file in -binary mode on those systems that support it. This is the equivalent of -opening the file normally, then calling Cing on the handle. - - sysopen(BINDAT, "records.data", O_RDWR | O_BINARY) - || die "can't open records.data: $!"; - -Now you can use C and C on that handle without worrying -about the system non-standard I/O library breaking your data. It's not -a pretty picture, but then, legacy systems seldom are. CP/M will be -with us until the end of days, and after. - -On systems with exotic I/O systems, it turns out that, astonishingly -enough, even unbuffered I/O using C and C might do -sneaky data mutilation behind your back. - - while (sysread(WHENCE, $buf, 1024)) { - syswrite(WHITHER, $buf, length($buf)); - } - -Depending on the vicissitudes of your runtime system, even these calls -may need C or C first. Systems known to be free of -such difficulties include Unix, the Mac OS, Plan9, and Inferno. - -=head2 File Locking - -In a multitasking environment, you may need to be careful not to collide -with other processes who want to do I/O on the same files as others -are working on. You'll often need shared or exclusive locks -on files for reading and writing respectively. You might just -pretend that only exclusive locks exist. - -Never use the existence of a file C<-e $file> as a locking indication, -because there is a race condition between the test for the existence of -the file and its creation. Atomicity is critical. - -Perl's most portable locking interface is via the C function, -whose simplicity is emulated on systems that don't directly support it, -such as SysV or WindowsNT. The underlying semantics may affect how -it all works, so you should learn how C is implemented on your -system's port of Perl. - -File locking I lock out another process that would like to -do I/O. A file lock only locks out others trying to get a lock, not -processes trying to do I/O. Because locks are advisory, if one process -uses locking and another doesn't, all bets are off. - -By default, the C call will block until a lock is granted. -A request for a shared lock will be granted as soon as there is no -exclusive locker. A request for a exclusive lock will be granted as -soon as there is no locker of any kind. Locks are on file descriptors, -not file names. You can't lock a file until you open it, and you can't -hold on to a lock once the file has been closed. - -Here's how to get a blocking shared lock on a file, typically used -for reading: - - use 5.004; - use Fcntl qw(:DEFAULT :flock); - open(FH, "< filename") or die "can't open filename: $!"; - flock(FH, LOCK_SH) or die "can't lock filename: $!"; - # now read from FH - -You can get a non-blocking lock by using C. - - flock(FH, LOCK_SH | LOCK_NB) - or die "can't lock filename: $!"; - -This can be useful for producing more user-friendly behaviour by warning -if you're going to be blocking: - - use 5.004; - use Fcntl qw(:DEFAULT :flock); - open(FH, "< filename") or die "can't open filename: $!"; - unless (flock(FH, LOCK_SH | LOCK_NB)) { - $| = 1; - print "Waiting for lock..."; - flock(FH, LOCK_SH) or die "can't lock filename: $!"; - print "got it.\n" - } - # now read from FH - -To get an exclusive lock, typically used for writing, you have to be -careful. We C the file so it can be locked before it gets -emptied. You can get a nonblocking version using C. - - use 5.004; - use Fcntl qw(:DEFAULT :flock); - sysopen(FH, "filename", O_WRONLY | O_CREAT) - or die "can't open filename: $!"; - flock(FH, LOCK_EX) - or die "can't lock filename: $!"; - truncate(FH, 0) - or die "can't truncate filename: $!"; - # now write to FH - -Finally, due to the uncounted millions who cannot be dissuaded from -wasting cycles on useless vanity devices called hit counters, here's -how to increment a number in a file safely: - - use Fcntl qw(:DEFAULT :flock); - - sysopen(FH, "numfile", O_RDWR | O_CREAT) - or die "can't open numfile: $!"; - # autoflush FH - $ofh = select(FH); $| = 1; select ($ofh); - flock(FH, LOCK_EX) - or die "can't write-lock numfile: $!"; - - $num = || 0; - seek(FH, 0, 0) - or die "can't rewind numfile : $!"; - print FH $num+1, "\n" - or die "can't write numfile: $!"; - - truncate(FH, tell(FH)) - or die "can't truncate numfile: $!"; - close(FH) - or die "can't close numfile: $!"; - -=head1 SEE ALSO - -The C and C function in perlfunc(1); -the standard open(2), dup(2), fopen(3), and fdopen(3) manpages; -the POSIX documentation. +To be announced. =head1 AUTHOR and COPYRIGHT -Copyright 1998 Tom Christiansen. - -When included as part of the Standard Version of Perl, or as part of -its complete documentation whether printed or otherwise, this work may -be distributed only under the terms of Perl's Artistic License. Any -distribution of this file or derivatives thereof outside of that -package require that special arrangements be made with copyright -holder. - -Irrespective of its distribution, all code examples in these files are -hereby placed into the public domain. You are permitted and -encouraged to use this code in your own programs for fun or for profit -as you see fit. A simple comment in the code giving credit would be -courteous but is not required. +Copyright 2013 Tom Christiansen. -=head1 HISTORY +This documentation is free; you can redistribute it and/or modify it under +the same terms as Perl itself. -First release: Sat Jan 9 08:09:11 MST 1999