This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Integrate perl
[perl5.git] / pod / perlopentut.pod
CommitLineData
f8284313
TC
1=head1 NAME
2
3perlopentut - tutorial on opening things in Perl
4
5=head1 DESCRIPTION
6
7Perl has two simple, built-in ways to open files: the shell way for
8convenience, and the C way for precision. The choice is yours.
9
10=head1 Open E<agrave> la shell
11
12Perl's C<open> function was designed to mimic the way command-line
13redirection in the shell works. Here are some basic examples
14from the shell:
15
16 $ myprogram file1 file2 file3
17 $ myprogram < inputfile
18 $ myprogram > outputfile
19 $ myprogram >> outputfile
20 $ myprogram | otherprogram
21 $ otherprogram | myprogram
22
23And here are some more advanced examples:
24
25 $ otherprogram | myprogram f1 - f2
26 $ otherprogram 2>&1 | myprogram -
27 $ myprogram <&3
28 $ myprogram >&4
29
30Programmers accustomed to constructs like those above can take comfort
31in learning that Perl directly supports these familiar constructs using
32virtually the same syntax as the shell.
33
34=head2 Simple Opens
35
36The C<open> function takes two arguments: the first is a filehandle,
37and the second is a single string comprising both what to open and how
38to open it. C<open> returns true when it works, and when it fails,
39returns a false value and sets the special variable $! to reflect
40the system error. If the filehandle was previously opened, it will
41be implicitly closed first.
42
43For example:
44
45 open(INFO, "datafile") || die("can't open datafile: $!");
46 open(INFO, "< datafile") || die("can't open datafile: $!");
47 open(RESULTS,"> runstats") || die("can't open runstats: $!");
48 open(LOG, ">> logfile ") || die("can't open logfile: $!");
49
50If you prefer the low-punctuation version, you could write that this way:
51
52 open INFO, "< datafile" or die "can't open datafile: $!";
53 open RESULTS,"> runstats" or die "can't open runstats: $!";
54 open LOG, ">> logfile " or die "can't open logfile: $!";
55
56A few things to notice. First, the leading less-than is optional.
57If omitted, Perl assumes that you want to open the file for reading.
58
59The other important thing to notice is that, just as in the shell,
60any white space before or after the filename is ignored. This is good,
61because you wouldn't want these to do different things:
62
63 open INFO, "<datafile"
64 open INFO, "< datafile"
65 open INFO, "< datafile"
66
67Ignoring surround whitespace also helps for when you read a filename in
68from a different file, and forget to trim it before opening:
69
70 $filename = <INFO>; # oops, \n still there
71 open(EXTRA, "< $filename") || die "can't open $filename: $!";
72
73This is not a bug, but a feature. Because C<open> mimics the shell in
74its style of using redirection arrows to specify how to open the file, it
75also does so with respect to extra white space around the filename itself
13a2d996
SP
76as well. For accessing files with naughty names, see
77L<"Dispelling the Dweomer">.
f8284313
TC
78
79=head2 Pipe Opens
80
81In C, when you want to open a file using the standard I/O library,
82you use the C<fopen> function, but when opening a pipe, you use the
83C<popen> function. But in the shell, you just use a different redirection
84character. That's also the case for Perl. The C<open> call
85remains the same--just its argument differs.
86
f5daac4a 87If the leading character is a pipe symbol, C<open> starts up a new
f8284313
TC
88command and open a write-only filehandle leading into that command.
89This lets you write into that handle and have what you write show up on
90that command's standard input. For example:
91
369c5433 92 open(PRINTER, "| lpr -Plp1") || die "can't run lpr: $!";
f8284313
TC
93 print PRINTER "stuff\n";
94 close(PRINTER) || die "can't close lpr: $!";
95
96If the trailing character is a pipe, you start up a new command and open a
97read-only filehandle leading out of that command. This lets whatever that
98command writes to its standard output show up on your handle for reading.
99For example:
100
369c5433 101 open(NET, "netstat -i -n |") || die "can't fun netstat: $!";
f8284313
TC
102 while (<NET>) { } # do something with input
103 close(NET) || die "can't close netstat: $!";
104
369c5433
MJD
105What happens if you try to open a pipe to or from a non-existent
106command? If possible, Perl will detect the failure and set C<$!> as
107usual. But if the command contains special shell characters, such as
108C<E<gt>> or C<*>, called 'metacharacters', Perl does not execute the
109command directly. Instead, Perl runs the shell, which then tries to
110run the command. This means that it's the shell that gets the error
111indication. In such a case, the C<open> call will only indicate
112failure if Perl can't even run the shell. See L<perlfaq8/"How can I
113capture STDERR from an external command?"> to see how to cope with
114this. There's also an explanation in L<perlipc>.
f8284313
TC
115
116If you would like to open a bidirectional pipe, the IPC::Open2
13a2d996
SP
117library will handle this for you. Check out
118L<perlipc/"Bidirectional Communication with Another Process">
f8284313
TC
119
120=head2 The Minus File
121
122Again following the lead of the standard shell utilities, Perl's
123C<open> function treats a file whose name is a single minus, "-", in a
124special way. If you open minus for reading, it really means to access
125the standard input. If you open minus for writing, it really means to
126access the standard output.
127
40b7eeef 128If minus can be used as the default input or default output, what happens
f8284313 129if you open a pipe into or out of minus? What's the default command it
40b7eeef 130would run? The same script as you're currently running! This is actually
13a2d996
SP
131a stealth C<fork> hidden inside an C<open> call. See
132L<perlipc/"Safe Pipe Opens"> for details.
f8284313
TC
133
134=head2 Mixing Reads and Writes
135
136It is possible to specify both read and write access. All you do is
137add a "+" symbol in front of the redirection. But as in the shell,
138using a less-than on a file never creates a new file; it only opens an
139existing one. On the other hand, using a greater-than always clobbers
140(truncates to zero length) an existing file, or creates a brand-new one
141if there isn't an old one. Adding a "+" for read-write doesn't affect
142whether it only works on existing files or always clobbers existing ones.
143
144 open(WTMP, "+< /usr/adm/wtmp")
145 || die "can't open /usr/adm/wtmp: $!";
146
147 open(SCREEN, "+> /tmp/lkscreen")
148 || die "can't open /tmp/lkscreen: $!";
149
150 open(LOGFILE, "+>> /tmp/applog"
151 || die "can't open /tmp/applog: $!";
152
153The first one won't create a new file, and the second one will always
154clobber an old one. The third one will create a new file if necessary
155and not clobber an old one, and it will allow you to read at any point
156in the file, but all writes will always go to the end. In short,
157the first case is substantially more common than the second and third
158cases, which are almost always wrong. (If you know C, the plus in
159Perl's C<open> is historically derived from the one in C's fopen(3S),
160which it ultimately calls.)
161
162In fact, when it comes to updating a file, unless you're working on
163a binary file as in the WTMP case above, you probably don't want to
164use this approach for updating. Instead, Perl's B<-i> flag comes to
165the rescue. The following command takes all the C, C++, or yacc source
166or header files and changes all their foo's to bar's, leaving
167the old version in the original file name with a ".orig" tacked
168on the end:
169
170 $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]
171
172This is a short cut for some renaming games that are really
173the best way to update textfiles. See the second question in
174L<perlfaq5> for more details.
175
176=head2 Filters
177
178One of the most common uses for C<open> is one you never
179even notice. When you process the ARGV filehandle using
c47ff5f1 180C<< <ARGV> >>, Perl actually does an implicit open
f8284313
TC
181on each file in @ARGV. Thus a program called like this:
182
183 $ myprogram file1 file2 file3
184
185Can have all its files opened and processed one at a time
186using a construct no more complex than:
187
188 while (<>) {
189 # do something with $_
190 }
191
192If @ARGV is empty when the loop first begins, Perl pretends you've opened
193up minus, that is, the standard input. In fact, $ARGV, the currently
c47ff5f1 194open file during C<< <ARGV> >> processing, is even set to "-"
f8284313
TC
195in these circumstances.
196
197You are welcome to pre-process your @ARGV before starting the loop to
198make sure it's to your liking. One reason to do this might be to remove
199command options beginning with a minus. While you can always roll the
200simple ones by hand, the Getopts modules are good for this.
201
202 use Getopt::Std;
203
204 # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
205 getopts("vDo:");
206
207 # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
208 getopts("vDo:", \%args);
209
210Or the standard Getopt::Long module to permit named arguments:
211
212 use Getopt::Long;
213 GetOptions( "verbose" => \$verbose, # --verbose
214 "Debug" => \$debug, # --Debug
215 "output=s" => \$output );
216 # --output=somestring or --output somestring
217
218Another reason for preprocessing arguments is to make an empty
219argument list default to all files:
220
221 @ARGV = glob("*") unless @ARGV;
222
223You could even filter out all but plain, text files. This is a bit
224silent, of course, and you might prefer to mention them on the way.
225
226 @ARGV = grep { -f && -T } @ARGV;
227
228If you're using the B<-n> or B<-p> command-line options, you
229should put changes to @ARGV in a C<BEGIN{}> block.
230
231Remember that a normal C<open> has special properties, in that it might
232call fopen(3S) or it might called popen(3S), depending on what its
233argument looks like; that's why it's sometimes called "magic open".
234Here's an example:
235
236 $pwdinfo = `domainname` =~ /^(\(none\))?$/
237 ? '< /etc/passwd'
238 : 'ypcat passwd |';
239
240 open(PWD, $pwdinfo)
241 or die "can't open $pwdinfo: $!";
242
243This sort of thing also comes into play in filter processing. Because
c47ff5f1 244C<< <ARGV> >> processing employs the normal, shell-style Perl C<open>,
f8284313
TC
245it respects all the special things we've already seen:
246
247 $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
248
249That program will read from the file F<f1>, the process F<cmd1>, standard
250input (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command,
251and finally the F<f3> file.
252
253Yes, this also means that if you have a file named "-" (and so on) in
254your directory, that they won't be processed as literal files by C<open>.
255You'll need to pass them as "./-" much as you would for the I<rm> program.
256Or you could use C<sysopen> as described below.
257
258One of the more interesting applications is to change files of a certain
259name into pipes. For example, to autoprocess gzipped or compressed
260files by decompressing them with I<gzip>:
261
262 @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV;
263
264Or, if you have the I<GET> program installed from LWP,
265you can fetch URLs before processing them:
266
267 @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;
268
c47ff5f1 269It's not for nothing that this is called magic C<< <ARGV> >>.
f8284313
TC
270Pretty nifty, eh?
271
272=head1 Open E<agrave> la C
273
274If you want the convenience of the shell, then Perl's C<open> is
275definitely the way to go. On the other hand, if you want finer precision
276than C's simplistic fopen(3S) provides, then you should look to Perl's
277C<sysopen>, which is a direct hook into the open(2) system call.
278That does mean it's a bit more involved, but that's the price of
279precision.
280
281C<sysopen> takes 3 (or 4) arguments.
282
283 sysopen HANDLE, PATH, FLAGS, [MASK]
284
285The HANDLE argument is a filehandle just as with C<open>. The PATH is
286a literal path, one that doesn't pay attention to any greater-thans or
287less-thans or pipes or minuses, nor ignore white space. If it's there,
288it's part of the path. The FLAGS argument contains one or more values
289derived from the Fcntl module that have been or'd together using the
290bitwise "|" operator. The final argument, the MASK, is optional; if
291present, it is combined with the user's current umask for the creation
292mode of the file. You should usually omit this.
293
294Although the traditional values of read-only, write-only, and read-write
295are 0, 1, and 2 respectively, this is known not to hold true on some
296systems. Instead, it's best to load in the appropriate constants first
297from the Fcntl module, which supplies the following standard flags:
298
299 O_RDONLY Read only
300 O_WRONLY Write only
301 O_RDWR Read and write
302 O_CREAT Create the file if it doesn't exist
303 O_EXCL Fail if the file already exists
304 O_APPEND Append to the file
305 O_TRUNC Truncate the file
306 O_NONBLOCK Non-blocking access
307
ca6e1c26
JH
308Less common flags that are sometimes available on some operating
309systems include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>,
310C<O_DEFER>, C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>,
311C<O_NOCTTY>, C<O_NDELAY> and C<O_LARGEFILE>. Consult your open(2)
312manpage or its local equivalent for details. (Note: starting from
313Perl release 5.6 the O_LARGEFILE flag, if available, is automatically
106325ad 314added to the sysopen() flags because large files are the default.)
f8284313
TC
315
316Here's how to use C<sysopen> to emulate the simple C<open> calls we had
317before. We'll omit the C<|| die $!> checks for clarity, but make sure
318you always check the return values in real code. These aren't quite
319the same, since C<open> will trim leading and trailing white space,
320but you'll get the idea:
321
322To open a file for reading:
323
324 open(FH, "< $path");
325 sysopen(FH, $path, O_RDONLY);
326
327To open a file for writing, creating a new file if needed or else truncating
328an old file:
329
330 open(FH, "> $path");
331 sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT);
332
333To open a file for appending, creating one if necessary:
334
335 open(FH, ">> $path");
336 sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT);
337
338To open a file for update, where the file must already exist:
339
340 open(FH, "+< $path");
341 sysopen(FH, $path, O_RDWR);
342
343And here are things you can do with C<sysopen> that you cannot do with
344a regular C<open>. As you see, it's just a matter of controlling the
345flags in the third argument.
346
347To open a file for writing, creating a new file which must not previously
348exist:
349
350 sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);
351
352To open a file for appending, where that file must already exist:
353
354 sysopen(FH, $path, O_WRONLY | O_APPEND);
355
356To open a file for update, creating a new file if necessary:
357
358 sysopen(FH, $path, O_RDWR | O_CREAT);
359
360To open a file for update, where that file must not already exist:
361
362 sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);
363
364To open a file without blocking, creating one if necessary:
365
366 sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);
367
368=head2 Permissions E<agrave> la mode
369
370If you omit the MASK argument to C<sysopen>, Perl uses the octal value
3710666. The normal MASK to use for executables and directories should
372be 0777, and for anything else, 0666.
373
374Why so permissive? Well, it isn't really. The MASK will be modified
375by your process's current C<umask>. A umask is a number representing
376I<disabled> permissions bits; that is, bits that will not be turned on
377in the created files' permissions field.
378
379For example, if your C<umask> were 027, then the 020 part would
380disable the group from writing, and the 007 part would disable others
381from reading, writing, or executing. Under these conditions, passing
382C<sysopen> 0666 would create a file with mode 0640, since C<0666 &~ 027>
383is 0640.
384
385You should seldom use the MASK argument to C<sysopen()>. That takes
386away the user's freedom to choose what permission new files will have.
387Denying choice is almost always a bad thing. One exception would be for
388cases where sensitive or private data is being stored, such as with mail
389folders, cookie files, and internal temporary files.
390
391=head1 Obscure Open Tricks
392
393=head2 Re-Opening Files (dups)
394
395Sometimes you already have a filehandle open, and want to make another
396handle that's a duplicate of the first one. In the shell, we place an
397ampersand in front of a file descriptor number when doing redirections.
c47ff5f1 398For example, C<< 2>&1 >> makes descriptor 2 (that's STDERR in Perl)
f8284313
TC
399be redirected into descriptor 1 (which is usually Perl's STDOUT).
400The same is essentially true in Perl: a filename that begins with an
401ampersand is treated instead as a file descriptor if a number, or as a
402filehandle if a string.
403
404 open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!";
405 open(MHCONTEXT, "<&4") || die "couldn't dup fd4: $!";
406
407That means that if a function is expecting a filename, but you don't
408want to give it a filename because you already have the file open, you
409can just pass the filehandle with a leading ampersand. It's best to
410use a fully qualified handle though, just in case the function happens
411to be in a different package:
412
413 somefunction("&main::LOGFILE");
414
415This way if somefunction() is planning on opening its argument, it can
416just use the already opened handle. This differs from passing a handle,
417because with a handle, you don't open the file. Here you have something
418you can pass to open.
419
420If you have one of those tricky, newfangled I/O objects that the C++
421folks are raving about, then this doesn't work because those aren't a
422proper filehandle in the native Perl sense. You'll have to use fileno()
423to pull out the proper descriptor number, assuming you can:
424
425 use IO::Socket;
426 $handle = IO::Socket::INET->new("www.perl.com:80");
427 $fd = $handle->fileno;
428 somefunction("&$fd"); # not an indirect function call
429
430It can be easier (and certainly will be faster) just to use real
431filehandles though:
432
433 use IO::Socket;
434 local *REMOTE = IO::Socket::INET->new("www.perl.com:80");
435 die "can't connect" unless defined(fileno(REMOTE));
436 somefunction("&main::REMOTE");
437
438If the filehandle or descriptor number is preceded not just with a simple
439"&" but rather with a "&=" combination, then Perl will not create a
440completely new descriptor opened to the same place using the dup(2)
441system call. Instead, it will just make something of an alias to the
442existing one using the fdopen(3S) library call This is slightly more
443parsimonious of systems resources, although this is less a concern
444these days. Here's an example of that:
445
446 $fd = $ENV{"MHCONTEXTFD"};
447 open(MHCONTEXT, "<&=$fd") or die "couldn't fdopen $fd: $!";
448
c47ff5f1
GS
449If you're using magic C<< <ARGV> >>, you could even pass in as a
450command line argument in @ARGV something like C<"<&=$MHCONTEXTFD">,
f8284313
TC
451but we've never seen anyone actually do this.
452
453=head2 Dispelling the Dweomer
454
455Perl is more of a DWIMmer language than something like Java--where DWIM
456is an acronym for "do what I mean". But this principle sometimes leads
457to more hidden magic than one knows what to do with. In this way, Perl
458is also filled with I<dweomer>, an obscure word meaning an enchantment.
459Sometimes, Perl's DWIMmer is just too much like dweomer for comfort.
460
461If magic C<open> is a bit too magical for you, you don't have to turn
462to C<sysopen>. To open a file with arbitrary weird characters in
463it, it's necessary to protect any leading and trailing whitespace.
464Leading whitespace is protected by inserting a C<"./"> in front of a
465filename that starts with whitespace. Trailing whitespace is protected
466by appending an ASCII NUL byte (C<"\0">) at the end off the string.
467
468 $file =~ s#^(\s)#./$1#;
469 open(FH, "< $file\0") || die "can't open $file: $!";
470
471This assumes, of course, that your system considers dot the current
472working directory, slash the directory separator, and disallows ASCII
473NULs within a valid filename. Most systems follow these conventions,
474including all POSIX systems as well as proprietary Microsoft systems.
475The only vaguely popular system that doesn't work this way is the
476proprietary Macintosh system, which uses a colon where the rest of us
477use a slash. Maybe C<sysopen> isn't such a bad idea after all.
478
c47ff5f1 479If you want to use C<< <ARGV> >> processing in a totally boring
f8284313
TC
480and non-magical way, you could do this first:
481
482 # "Sam sat on the ground and put his head in his hands.
483 # 'I wish I had never come here, and I don't want to see
484 # no more magic,' he said, and fell silent."
485 for (@ARGV) {
486 s#^([^./])#./$1#;
487 $_ .= "\0";
488 }
489 while (<>) {
490 # now process $_
491 }
492
493But be warned that users will not appreciate being unable to use "-"
494to mean standard input, per the standard convention.
495
496=head2 Paths as Opens
497
498You've probably noticed how Perl's C<warn> and C<die> functions can
499produce messages like:
500
1761cee5 501 Some warning at scriptname line 29, <FH> line 7.
f8284313
TC
502
503That's because you opened a filehandle FH, and had read in seven records
504from it. But what was the name of the file, not the handle?
505
506If you aren't running with C<strict refs>, or if you've turn them off
507temporarily, then all you have to do is this:
508
509 open($path, "< $path") || die "can't open $path: $!";
510 while (<$path>) {
511 # whatever
512 }
513
514Since you're using the pathname of the file as its handle,
515you'll get warnings more like
516
1761cee5 517 Some warning at scriptname line 29, </etc/motd> line 7.
f8284313
TC
518
519=head2 Single Argument Open
520
521Remember how we said that Perl's open took two arguments? That was a
522passive prevarication. You see, it can also take just one argument.
523If and only if the variable is a global variable, not a lexical, you
524can pass C<open> just one argument, the filehandle, and it will
525get the path from the global scalar variable of the same name.
526
527 $FILE = "/etc/motd";
528 open FILE or die "can't open $FILE: $!";
529 while (<FILE>) {
530 # whatever
531 }
532
533Why is this here? Someone has to cater to the hysterical porpoises.
534It's something that's been in Perl since the very beginning, if not
535before.
536
537=head2 Playing with STDIN and STDOUT
538
539One clever move with STDOUT is to explicitly close it when you're done
540with the program.
541
542 END { close(STDOUT) || die "can't close stdout: $!" }
543
544If you don't do this, and your program fills up the disk partition due
545to a command line redirection, it won't report the error exit with a
546failure status.
547
548You don't have to accept the STDIN and STDOUT you were given. You are
549welcome to reopen them if you'd like.
550
551 open(STDIN, "< datafile")
552 || die "can't open datafile: $!";
553
554 open(STDOUT, "> output")
555 || die "can't open output: $!";
556
557And then these can be read directly or passed on to subprocesses.
558This makes it look as though the program were initially invoked
559with those redirections from the command line.
560
561It's probably more interesting to connect these to pipes. For example:
562
563 $pager = $ENV{PAGER} || "(less || more)";
564 open(STDOUT, "| $pager")
565 || die "can't fork a pager: $!";
566
567This makes it appear as though your program were called with its stdout
568already piped into your pager. You can also use this kind of thing
569in conjunction with an implicit fork to yourself. You might do this
570if you would rather handle the post processing in your own program,
571just in a different process:
572
573 head(100);
574 while (<>) {
575 print;
576 }
577
578 sub head {
579 my $lines = shift || 20;
580 return unless $pid = open(STDOUT, "|-");
581 die "cannot fork: $!" unless defined $pid;
582 while (<STDIN>) {
583 print;
584 last if --$lines < 0;
585 }
586 exit;
587 }
588
589This technique can be applied to repeatedly push as many filters on your
590output stream as you wish.
591
592=head1 Other I/O Issues
593
594These topics aren't really arguments related to C<open> or C<sysopen>,
595but they do affect what you do with your open files.
596
597=head2 Opening Non-File Files
598
599When is a file not a file? Well, you could say when it exists but
600isn't a plain file. We'll check whether it's a symbolic link first,
601just in case.
602
603 if (-l $file || ! -f _) {
604 print "$file is not a plain file\n";
605 }
606
607What other kinds of files are there than, well, files? Directories,
608symbolic links, named pipes, Unix-domain sockets, and block and character
609devices. Those are all files, too--just not I<plain> files. This isn't
610the same issue as being a text file. Not all text files are plain files.
611Not all plain files are textfiles. That's why there are separate C<-f>
612and C<-T> file tests.
613
614To open a directory, you should use the C<opendir> function, then
615process it with C<readdir>, carefully restoring the directory
616name if necessary:
617
618 opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
619 while (defined($file = readdir(DIR))) {
620 # do something with "$dirname/$file"
621 }
622 closedir(DIR);
623
624If you want to process directories recursively, it's better to use the
625File::Find module. For example, this prints out all files recursively,
626add adds a slash to their names if the file is a directory.
627
628 @ARGV = qw(.) unless @ARGV;
629 use File::Find;
630 find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;
631
632This finds all bogus symbolic links beneath a particular directory:
633
634 find sub { print "$File::Find::name\n" if -l && !-e }, $dir;
635
636As you see, with symbolic links, you can just pretend that it is
637what it points to. Or, if you want to know I<what> it points to, then
638C<readlink> is called for:
639
640 if (-l $file) {
641 if (defined($whither = readlink($file))) {
642 print "$file points to $whither\n";
643 } else {
644 print "$file points nowhere: $!\n";
645 }
646 }
647
648Named pipes are a different matter. You pretend they're regular files,
649but their opens will normally block until there is both a reader and
650a writer. You can read more about them in L<perlipc/"Named Pipes">.
651Unix-domain sockets are rather different beasts as well; they're
652described in L<perlipc/"Unix-Domain TCP Clients and Servers">.
653
654When it comes to opening devices, it can be easy and it can tricky.
655We'll assume that if you're opening up a block device, you know what
656you're doing. The character devices are more interesting. These are
657typically used for modems, mice, and some kinds of printers. This is
658described in L<perlfaq8/"How do I read and write the serial port?">
659It's often enough to open them carefully:
660
661 sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY)
662 # (O_NOCTTY no longer needed on POSIX systems)
663 or die "can't open /dev/ttyS1: $!";
664 open(TTYOUT, "+>&TTYIN")
665 or die "can't dup TTYIN: $!";
666
667 $ofh = select(TTYOUT); $| = 1; select($ofh);
668
669 print TTYOUT "+++at\015";
670 $answer = <TTYIN>;
671
672With descriptors that you haven't opened using C<sysopen>, such as a
673socket, you can set them to be non-blocking using C<fcntl>:
674
675 use Fcntl;
676 fcntl(Connection, F_SETFL, O_NONBLOCK)
677 or die "can't set non blocking: $!";
678
679Rather than losing yourself in a morass of twisting, turning C<ioctl>s,
680all dissimilar, if you're going to manipulate ttys, it's best to
681make calls out to the stty(1) program if you have it, or else use the
682portable POSIX interface. To figure this all out, you'll need to read the
683termios(3) manpage, which describes the POSIX interface to tty devices,
684and then L<POSIX>, which describes Perl's interface to POSIX. There are
685also some high-level modules on CPAN that can help you with these games.
686Check out Term::ReadKey and Term::ReadLine.
687
688What else can you open? To open a connection using sockets, you won't use
13a2d996
SP
689one of Perl's two open functions. See
690L<perlipc/"Sockets: Client/Server Communication"> for that. Here's an
691example. Once you have it, you can use FH as a bidirectional filehandle.
f8284313
TC
692
693 use IO::Socket;
694 local *FH = IO::Socket::INET->new("www.perl.com:80");
695
696For opening up a URL, the LWP modules from CPAN are just what
697the doctor ordered. There's no filehandle interface, but
698it's still easy to get the contents of a document:
699
700 use LWP::Simple;
6cecdcac 701 $doc = get('http://www.linpro.no/lwp/');
f8284313
TC
702
703=head2 Binary Files
704
705On certain legacy systems with what could charitably be called terminally
706convoluted (some would say broken) I/O models, a file isn't a file--at
707least, not with respect to the C standard I/O library. On these old
708systems whose libraries (but not kernels) distinguish between text and
709binary streams, to get files to behave properly you'll have to bend over
710backwards to avoid nasty problems. On such infelicitous systems, sockets
711and pipes are already opened in binary mode, and there is currently no
712way to turn that off. With files, you have more options.
713
714Another option is to use the C<binmode> function on the appropriate
715handles before doing regular I/O on them:
716
717 binmode(STDIN);
718 binmode(STDOUT);
719 while (<STDIN>) { print }
720
721Passing C<sysopen> a non-standard flag option will also open the file in
722binary mode on those systems that support it. This is the equivalent of
723opening the file normally, then calling C<binmode>ing on the handle.
724
725 sysopen(BINDAT, "records.data", O_RDWR | O_BINARY)
726 || die "can't open records.data: $!";
727
728Now you can use C<read> and C<print> on that handle without worrying
729about the system non-standard I/O library breaking your data. It's not
730a pretty picture, but then, legacy systems seldom are. CP/M will be
731with us until the end of days, and after.
732
733On systems with exotic I/O systems, it turns out that, astonishingly
734enough, even unbuffered I/O using C<sysread> and C<syswrite> might do
735sneaky data mutilation behind your back.
736
737 while (sysread(WHENCE, $buf, 1024)) {
738 syswrite(WHITHER, $buf, length($buf));
739 }
740
741Depending on the vicissitudes of your runtime system, even these calls
742may need C<binmode> or C<O_BINARY> first. Systems known to be free of
3541f8c8 743such difficulties include Unix, the Mac OS, Plan 9, and Inferno.
f8284313
TC
744
745=head2 File Locking
746
747In a multitasking environment, you may need to be careful not to collide
748with other processes who want to do I/O on the same files as others
749are working on. You'll often need shared or exclusive locks
750on files for reading and writing respectively. You might just
751pretend that only exclusive locks exist.
752
753Never use the existence of a file C<-e $file> as a locking indication,
754because there is a race condition between the test for the existence of
755the file and its creation. Atomicity is critical.
756
757Perl's most portable locking interface is via the C<flock> function,
758whose simplicity is emulated on systems that don't directly support it,
759such as SysV or WindowsNT. The underlying semantics may affect how
760it all works, so you should learn how C<flock> is implemented on your
761system's port of Perl.
762
763File locking I<does not> lock out another process that would like to
764do I/O. A file lock only locks out others trying to get a lock, not
765processes trying to do I/O. Because locks are advisory, if one process
766uses locking and another doesn't, all bets are off.
767
768By default, the C<flock> call will block until a lock is granted.
769A request for a shared lock will be granted as soon as there is no
d1be9408 770exclusive locker. A request for an exclusive lock will be granted as
f8284313
TC
771soon as there is no locker of any kind. Locks are on file descriptors,
772not file names. You can't lock a file until you open it, and you can't
773hold on to a lock once the file has been closed.
774
775Here's how to get a blocking shared lock on a file, typically used
776for reading:
777
778 use 5.004;
779 use Fcntl qw(:DEFAULT :flock);
780 open(FH, "< filename") or die "can't open filename: $!";
781 flock(FH, LOCK_SH) or die "can't lock filename: $!";
782 # now read from FH
783
784You can get a non-blocking lock by using C<LOCK_NB>.
785
786 flock(FH, LOCK_SH | LOCK_NB)
787 or die "can't lock filename: $!";
788
789This can be useful for producing more user-friendly behaviour by warning
790if you're going to be blocking:
791
792 use 5.004;
793 use Fcntl qw(:DEFAULT :flock);
794 open(FH, "< filename") or die "can't open filename: $!";
795 unless (flock(FH, LOCK_SH | LOCK_NB)) {
796 $| = 1;
797 print "Waiting for lock...";
798 flock(FH, LOCK_SH) or die "can't lock filename: $!";
799 print "got it.\n"
800 }
801 # now read from FH
802
803To get an exclusive lock, typically used for writing, you have to be
804careful. We C<sysopen> the file so it can be locked before it gets
805emptied. You can get a nonblocking version using C<LOCK_EX | LOCK_NB>.
806
807 use 5.004;
808 use Fcntl qw(:DEFAULT :flock);
809 sysopen(FH, "filename", O_WRONLY | O_CREAT)
810 or die "can't open filename: $!";
811 flock(FH, LOCK_EX)
812 or die "can't lock filename: $!";
813 truncate(FH, 0)
814 or die "can't truncate filename: $!";
815 # now write to FH
816
817Finally, due to the uncounted millions who cannot be dissuaded from
818wasting cycles on useless vanity devices called hit counters, here's
819how to increment a number in a file safely:
820
821 use Fcntl qw(:DEFAULT :flock);
822
823 sysopen(FH, "numfile", O_RDWR | O_CREAT)
824 or die "can't open numfile: $!";
825 # autoflush FH
826 $ofh = select(FH); $| = 1; select ($ofh);
827 flock(FH, LOCK_EX)
828 or die "can't write-lock numfile: $!";
829
830 $num = <FH> || 0;
831 seek(FH, 0, 0)
832 or die "can't rewind numfile : $!";
833 print FH $num+1, "\n"
834 or die "can't write numfile: $!";
835
836 truncate(FH, tell(FH))
837 or die "can't truncate numfile: $!";
838 close(FH)
839 or die "can't close numfile: $!";
840
3541f8c8
CN
841=head2 IO Layers
842
843In Perl 5.8.0 a new I/O framework called "PerlIO" was introduced.
844This is a new "plumbing" for all the I/O happening in Perl; for the
845most part everything will work just as it did, but PerlIO brought in
846also some new features, like the capability of think of I/O as "layers".
847One I/O layer may in addition to just moving the data also do
848transformations on the data. Such transformations may include
849compression and decompression, encryption and decryption, and transforming
850between various character encodings.
851
852Full discussion about the features of PerlIO is out of scope for this
853tutorial, but here is how to recognize the layers being used:
854
855=over 4
856
857=item *
858
859The three-(or more)-argument form of C<open()> is being used and the
860second argument contains something else in addition to the usual
861C<< '<' >>, C<< '>' >>, C<< '>>' >>, C<< '|' >> and their variants,
862for example:
863
864 open(my $fh, "<:utf8", $fn);
865
866=item *
867
868The two-argument form of C<binmode<open()> is being used, for example
869
870 binmode($fh, ":encoding(utf16)");
871
872=back
873
874For more detailed discussion about PerlIO see L<perlio>;
875for more detailed discussion about Unicode and I/O see L<perluniintro>.
876
f8284313
TC
877=head1 SEE ALSO
878
879The C<open> and C<sysopen> function in perlfunc(1);
880the standard open(2), dup(2), fopen(3), and fdopen(3) manpages;
881the POSIX documentation.
882
883=head1 AUTHOR and COPYRIGHT
884
885Copyright 1998 Tom Christiansen.
886
5a7beb56
JH
887This documentation is free; you can redistribute it and/or modify it
888under the same terms as Perl itself.
f8284313
TC
889
890Irrespective of its distribution, all code examples in these files are
891hereby placed into the public domain. You are permitted and
892encouraged to use this code in your own programs for fun or for profit
893as you see fit. A simple comment in the code giving credit would be
894courteous but is not required.
895
896=head1 HISTORY
897
898First release: Sat Jan 9 08:09:11 MST 1999