Commit | Line | Data |
---|---|---|
f8284313 TC |
1 | =head1 NAME |
2 | ||
3 | perlopentut - tutorial on opening things in Perl | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | Perl has two simple, built-in ways to open files: the shell way for | |
1a193132 AL |
8 | convenience, and the C way for precision. The shell way also has 2- and |
9 | 3-argument forms, which have different semantics for handling the filename. | |
10 | The choice is yours. | |
f8284313 TC |
11 | |
12 | =head1 Open E<agrave> la shell | |
13 | ||
14 | Perl's C<open> function was designed to mimic the way command-line | |
15 | redirection in the shell works. Here are some basic examples | |
16 | from the shell: | |
17 | ||
18 | $ myprogram file1 file2 file3 | |
19 | $ myprogram < inputfile | |
20 | $ myprogram > outputfile | |
21 | $ myprogram >> outputfile | |
22 | $ myprogram | otherprogram | |
23 | $ otherprogram | myprogram | |
24 | ||
25 | And here are some more advanced examples: | |
26 | ||
27 | $ otherprogram | myprogram f1 - f2 | |
28 | $ otherprogram 2>&1 | myprogram - | |
29 | $ myprogram <&3 | |
30 | $ myprogram >&4 | |
31 | ||
32 | Programmers accustomed to constructs like those above can take comfort | |
33 | in learning that Perl directly supports these familiar constructs using | |
34 | virtually the same syntax as the shell. | |
35 | ||
36 | =head2 Simple Opens | |
37 | ||
38 | The C<open> function takes two arguments: the first is a filehandle, | |
39 | and the second is a single string comprising both what to open and how | |
40 | to open it. C<open> returns true when it works, and when it fails, | |
1a193132 | 41 | returns a false value and sets the special variable C<$!> to reflect |
f8284313 TC |
42 | the system error. If the filehandle was previously opened, it will |
43 | be implicitly closed first. | |
44 | ||
45 | For example: | |
46 | ||
47 | open(INFO, "datafile") || die("can't open datafile: $!"); | |
48 | open(INFO, "< datafile") || die("can't open datafile: $!"); | |
49 | open(RESULTS,"> runstats") || die("can't open runstats: $!"); | |
50 | open(LOG, ">> logfile ") || die("can't open logfile: $!"); | |
51 | ||
52 | If you prefer the low-punctuation version, you could write that this way: | |
53 | ||
54 | open INFO, "< datafile" or die "can't open datafile: $!"; | |
55 | open RESULTS,"> runstats" or die "can't open runstats: $!"; | |
56 | open LOG, ">> logfile " or die "can't open logfile: $!"; | |
57 | ||
f66e0bd0 | 58 | A few things to notice. First, the leading C<< < >> is optional. |
f8284313 TC |
59 | If omitted, Perl assumes that you want to open the file for reading. |
60 | ||
1a193132 AL |
61 | Note also that the first example uses the C<||> logical operator, and the |
62 | second uses C<or>, which has lower precedence. Using C<||> in the latter | |
63 | examples would effectively mean | |
64 | ||
65 | open INFO, ( "< datafile" || die "can't open datafile: $!" ); | |
66 | ||
67 | which is definitely not what you want. | |
68 | ||
f8284313 | 69 | The other important thing to notice is that, just as in the shell, |
6b0ac556 | 70 | any whitespace before or after the filename is ignored. This is good, |
f8284313 TC |
71 | because you wouldn't want these to do different things: |
72 | ||
73 | open INFO, "<datafile" | |
74 | open INFO, "< datafile" | |
75 | open INFO, "< datafile" | |
76 | ||
1a193132 AL |
77 | Ignoring surrounding whitespace also helps for when you read a filename |
78 | in from a different file, and forget to trim it before opening: | |
f8284313 TC |
79 | |
80 | $filename = <INFO>; # oops, \n still there | |
81 | open(EXTRA, "< $filename") || die "can't open $filename: $!"; | |
82 | ||
83 | This is not a bug, but a feature. Because C<open> mimics the shell in | |
84 | its style of using redirection arrows to specify how to open the file, it | |
6b0ac556 | 85 | also does so with respect to extra whitespace around the filename itself |
13a2d996 SP |
86 | as well. For accessing files with naughty names, see |
87 | L<"Dispelling the Dweomer">. | |
f8284313 | 88 | |
1a193132 AL |
89 | There is also a 3-argument version of C<open>, which lets you put the |
90 | special redirection characters into their own argument: | |
91 | ||
92 | open( INFO, ">", $datafile ) || die "Can't create $datafile: $!"; | |
93 | ||
94 | In this case, the filename to open is the actual string in C<$datafile>, | |
95 | so you don't have to worry about C<$datafile> containing characters | |
96 | that might influence the open mode, or whitespace at the beginning of | |
97 | the filename that would be absorbed in the 2-argument version. Also, | |
98 | any reduction of unnecessary string interpolation is a good thing. | |
99 | ||
100 | =head2 Indirect Filehandles | |
101 | ||
102 | C<open>'s first argument can be a reference to a filehandle. As of | |
103 | perl 5.6.0, if the argument is uninitialized, Perl will automatically | |
104 | create a filehandle and put a reference to it in the first argument, | |
105 | like so: | |
106 | ||
107 | open( my $in, $infile ) or die "Couldn't read $infile: $!"; | |
108 | while ( <$in> ) { | |
109 | # do something with $_ | |
110 | } | |
111 | close $in; | |
112 | ||
113 | Indirect filehandles make namespace management easier. Since filehandles | |
114 | are global to the current package, two subroutines trying to open | |
115 | C<INFILE> will clash. With two functions opening indirect filehandles | |
116 | like C<my $infile>, there's no clash and no need to worry about future | |
117 | conflicts. | |
118 | ||
119 | Another convenient behavior is that an indirect filehandle automatically | |
d7d7fefd | 120 | closes when there are no more references to it: |
1a193132 AL |
121 | |
122 | sub firstline { | |
123 | open( my $in, shift ) && return scalar <$in>; | |
124 | # no close() required | |
125 | } | |
126 | ||
d7d7fefd SU |
127 | Indirect filehandles also make it easy to pass filehandles to and return |
128 | filehandles from subroutines: | |
129 | ||
130 | for my $file ( qw(this.conf that.conf) ) { | |
131 | my $fin = open_or_throw('<', $file); | |
132 | process_conf( $fin ); | |
133 | # no close() needed | |
134 | } | |
135 | ||
136 | use Carp; | |
137 | sub open_or_throw { | |
138 | my ($mode, $filename) = @_; | |
139 | open my $h, $mode, $filename | |
140 | or croak "Could not open '$filename': $!"; | |
141 | return $h; | |
142 | } | |
143 | ||
f8284313 TC |
144 | =head2 Pipe Opens |
145 | ||
146 | In C, when you want to open a file using the standard I/O library, | |
147 | you use the C<fopen> function, but when opening a pipe, you use the | |
148 | C<popen> function. But in the shell, you just use a different redirection | |
149 | character. That's also the case for Perl. The C<open> call | |
150 | remains the same--just its argument differs. | |
151 | ||
f5daac4a | 152 | If the leading character is a pipe symbol, C<open> starts up a new |
1a193132 | 153 | command and opens a write-only filehandle leading into that command. |
f8284313 TC |
154 | This lets you write into that handle and have what you write show up on |
155 | that command's standard input. For example: | |
156 | ||
369c5433 | 157 | open(PRINTER, "| lpr -Plp1") || die "can't run lpr: $!"; |
f8284313 TC |
158 | print PRINTER "stuff\n"; |
159 | close(PRINTER) || die "can't close lpr: $!"; | |
160 | ||
161 | If the trailing character is a pipe, you start up a new command and open a | |
162 | read-only filehandle leading out of that command. This lets whatever that | |
163 | command writes to its standard output show up on your handle for reading. | |
164 | For example: | |
165 | ||
1a193132 | 166 | open(NET, "netstat -i -n |") || die "can't fork netstat: $!"; |
f8284313 TC |
167 | while (<NET>) { } # do something with input |
168 | close(NET) || die "can't close netstat: $!"; | |
169 | ||
369c5433 MJD |
170 | What happens if you try to open a pipe to or from a non-existent |
171 | command? If possible, Perl will detect the failure and set C<$!> as | |
172 | usual. But if the command contains special shell characters, such as | |
173 | C<E<gt>> or C<*>, called 'metacharacters', Perl does not execute the | |
174 | command directly. Instead, Perl runs the shell, which then tries to | |
175 | run the command. This means that it's the shell that gets the error | |
176 | indication. In such a case, the C<open> call will only indicate | |
177 | failure if Perl can't even run the shell. See L<perlfaq8/"How can I | |
178 | capture STDERR from an external command?"> to see how to cope with | |
179 | this. There's also an explanation in L<perlipc>. | |
f8284313 TC |
180 | |
181 | If you would like to open a bidirectional pipe, the IPC::Open2 | |
13a2d996 SP |
182 | library will handle this for you. Check out |
183 | L<perlipc/"Bidirectional Communication with Another Process"> | |
f8284313 | 184 | |
494bd333 SF |
185 | perl-5.6.x introduced a version of piped open that executes a process |
186 | based on its command line arguments without relying on the shell. (Similar | |
187 | to the C<system(@LIST)> notation.) This is safer and faster than executing | |
188 | a single argument pipe-command, but does not allow special shell | |
189 | constructs. (It is also not supported on Microsoft Windows, Mac OS Classic | |
c3ae9cde | 190 | or RISC OS.) |
494bd333 SF |
191 | |
192 | Here's an example of C<open '-|'>, which prints a random Unix | |
193 | fortune cookie as uppercase: | |
194 | ||
195 | my $collection = shift(@ARGV); | |
196 | open my $fortune, '-|', 'fortune', $collection | |
197 | or die "Could not find fortune - $!"; | |
198 | while (<$fortune>) | |
199 | { | |
200 | print uc($_); | |
201 | } | |
202 | close($fortune); | |
203 | ||
204 | And this C<open '|-'> pipes into lpr: | |
205 | ||
206 | open my $printer, '|-', 'lpr', '-Plp1' | |
207 | or die "can't run lpr: $!"; | |
208 | print {$printer} "stuff\n"; | |
209 | close($printer) | |
210 | or die "can't close lpr: $!"; | |
211 | ||
f8284313 TC |
212 | =head2 The Minus File |
213 | ||
214 | Again following the lead of the standard shell utilities, Perl's | |
215 | C<open> function treats a file whose name is a single minus, "-", in a | |
216 | special way. If you open minus for reading, it really means to access | |
217 | the standard input. If you open minus for writing, it really means to | |
218 | access the standard output. | |
219 | ||
40b7eeef | 220 | If minus can be used as the default input or default output, what happens |
f8284313 | 221 | if you open a pipe into or out of minus? What's the default command it |
40b7eeef | 222 | would run? The same script as you're currently running! This is actually |
13a2d996 SP |
223 | a stealth C<fork> hidden inside an C<open> call. See |
224 | L<perlipc/"Safe Pipe Opens"> for details. | |
f8284313 TC |
225 | |
226 | =head2 Mixing Reads and Writes | |
227 | ||
228 | It is possible to specify both read and write access. All you do is | |
229 | add a "+" symbol in front of the redirection. But as in the shell, | |
230 | using a less-than on a file never creates a new file; it only opens an | |
231 | existing one. On the other hand, using a greater-than always clobbers | |
232 | (truncates to zero length) an existing file, or creates a brand-new one | |
233 | if there isn't an old one. Adding a "+" for read-write doesn't affect | |
234 | whether it only works on existing files or always clobbers existing ones. | |
235 | ||
236 | open(WTMP, "+< /usr/adm/wtmp") | |
237 | || die "can't open /usr/adm/wtmp: $!"; | |
238 | ||
2359510d SD |
239 | open(SCREEN, "+> lkscreen") |
240 | || die "can't open lkscreen: $!"; | |
f8284313 | 241 | |
1b9762da | 242 | open(LOGFILE, "+>> /var/log/applog") |
2359510d | 243 | || die "can't open /var/log/applog: $!"; |
f8284313 TC |
244 | |
245 | The first one won't create a new file, and the second one will always | |
246 | clobber an old one. The third one will create a new file if necessary | |
247 | and not clobber an old one, and it will allow you to read at any point | |
248 | in the file, but all writes will always go to the end. In short, | |
249 | the first case is substantially more common than the second and third | |
250 | cases, which are almost always wrong. (If you know C, the plus in | |
251 | Perl's C<open> is historically derived from the one in C's fopen(3S), | |
252 | which it ultimately calls.) | |
253 | ||
254 | In fact, when it comes to updating a file, unless you're working on | |
255 | a binary file as in the WTMP case above, you probably don't want to | |
256 | use this approach for updating. Instead, Perl's B<-i> flag comes to | |
257 | the rescue. The following command takes all the C, C++, or yacc source | |
258 | or header files and changes all their foo's to bar's, leaving | |
1a193132 | 259 | the old version in the original filename with a ".orig" tacked |
f8284313 TC |
260 | on the end: |
261 | ||
262 | $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy] | |
263 | ||
264 | This is a short cut for some renaming games that are really | |
265 | the best way to update textfiles. See the second question in | |
266 | L<perlfaq5> for more details. | |
267 | ||
268 | =head2 Filters | |
269 | ||
270 | One of the most common uses for C<open> is one you never | |
271 | even notice. When you process the ARGV filehandle using | |
c47ff5f1 | 272 | C<< <ARGV> >>, Perl actually does an implicit open |
f8284313 TC |
273 | on each file in @ARGV. Thus a program called like this: |
274 | ||
275 | $ myprogram file1 file2 file3 | |
276 | ||
1b9762da | 277 | can have all its files opened and processed one at a time |
f8284313 TC |
278 | using a construct no more complex than: |
279 | ||
280 | while (<>) { | |
281 | # do something with $_ | |
282 | } | |
283 | ||
284 | If @ARGV is empty when the loop first begins, Perl pretends you've opened | |
285 | up minus, that is, the standard input. In fact, $ARGV, the currently | |
c47ff5f1 | 286 | open file during C<< <ARGV> >> processing, is even set to "-" |
f8284313 TC |
287 | in these circumstances. |
288 | ||
289 | You are welcome to pre-process your @ARGV before starting the loop to | |
290 | make sure it's to your liking. One reason to do this might be to remove | |
291 | command options beginning with a minus. While you can always roll the | |
1a193132 | 292 | simple ones by hand, the Getopts modules are good for this: |
f8284313 TC |
293 | |
294 | use Getopt::Std; | |
295 | ||
296 | # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o | |
297 | getopts("vDo:"); | |
298 | ||
299 | # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o} | |
300 | getopts("vDo:", \%args); | |
301 | ||
302 | Or the standard Getopt::Long module to permit named arguments: | |
303 | ||
304 | use Getopt::Long; | |
305 | GetOptions( "verbose" => \$verbose, # --verbose | |
306 | "Debug" => \$debug, # --Debug | |
307 | "output=s" => \$output ); | |
308 | # --output=somestring or --output somestring | |
309 | ||
310 | Another reason for preprocessing arguments is to make an empty | |
311 | argument list default to all files: | |
312 | ||
313 | @ARGV = glob("*") unless @ARGV; | |
314 | ||
315 | You could even filter out all but plain, text files. This is a bit | |
316 | silent, of course, and you might prefer to mention them on the way. | |
317 | ||
318 | @ARGV = grep { -f && -T } @ARGV; | |
319 | ||
320 | If you're using the B<-n> or B<-p> command-line options, you | |
321 | should put changes to @ARGV in a C<BEGIN{}> block. | |
322 | ||
323 | Remember that a normal C<open> has special properties, in that it might | |
324 | call fopen(3S) or it might called popen(3S), depending on what its | |
325 | argument looks like; that's why it's sometimes called "magic open". | |
326 | Here's an example: | |
327 | ||
328 | $pwdinfo = `domainname` =~ /^(\(none\))?$/ | |
329 | ? '< /etc/passwd' | |
330 | : 'ypcat passwd |'; | |
331 | ||
332 | open(PWD, $pwdinfo) | |
333 | or die "can't open $pwdinfo: $!"; | |
334 | ||
335 | This sort of thing also comes into play in filter processing. Because | |
c47ff5f1 | 336 | C<< <ARGV> >> processing employs the normal, shell-style Perl C<open>, |
f8284313 TC |
337 | it respects all the special things we've already seen: |
338 | ||
339 | $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile | |
340 | ||
341 | That program will read from the file F<f1>, the process F<cmd1>, standard | |
342 | input (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command, | |
343 | and finally the F<f3> file. | |
344 | ||
1a193132 AL |
345 | Yes, this also means that if you have files named "-" (and so on) in |
346 | your directory, they won't be processed as literal files by C<open>. | |
347 | You'll need to pass them as "./-", much as you would for the I<rm> program, | |
348 | or you could use C<sysopen> as described below. | |
f8284313 TC |
349 | |
350 | One of the more interesting applications is to change files of a certain | |
351 | name into pipes. For example, to autoprocess gzipped or compressed | |
352 | files by decompressing them with I<gzip>: | |
353 | ||
0c42fe95 | 354 | @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV; |
f8284313 TC |
355 | |
356 | Or, if you have the I<GET> program installed from LWP, | |
357 | you can fetch URLs before processing them: | |
358 | ||
359 | @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV; | |
360 | ||
c47ff5f1 | 361 | It's not for nothing that this is called magic C<< <ARGV> >>. |
f8284313 TC |
362 | Pretty nifty, eh? |
363 | ||
364 | =head1 Open E<agrave> la C | |
365 | ||
366 | If you want the convenience of the shell, then Perl's C<open> is | |
367 | definitely the way to go. On the other hand, if you want finer precision | |
1a193132 | 368 | than C's simplistic fopen(3S) provides you should look to Perl's |
f8284313 TC |
369 | C<sysopen>, which is a direct hook into the open(2) system call. |
370 | That does mean it's a bit more involved, but that's the price of | |
371 | precision. | |
372 | ||
373 | C<sysopen> takes 3 (or 4) arguments. | |
374 | ||
375 | sysopen HANDLE, PATH, FLAGS, [MASK] | |
376 | ||
377 | The HANDLE argument is a filehandle just as with C<open>. The PATH is | |
378 | a literal path, one that doesn't pay attention to any greater-thans or | |
6b0ac556 | 379 | less-thans or pipes or minuses, nor ignore whitespace. If it's there, |
f8284313 TC |
380 | it's part of the path. The FLAGS argument contains one or more values |
381 | derived from the Fcntl module that have been or'd together using the | |
382 | bitwise "|" operator. The final argument, the MASK, is optional; if | |
383 | present, it is combined with the user's current umask for the creation | |
384 | mode of the file. You should usually omit this. | |
385 | ||
386 | Although the traditional values of read-only, write-only, and read-write | |
387 | are 0, 1, and 2 respectively, this is known not to hold true on some | |
388 | systems. Instead, it's best to load in the appropriate constants first | |
389 | from the Fcntl module, which supplies the following standard flags: | |
390 | ||
391 | O_RDONLY Read only | |
392 | O_WRONLY Write only | |
393 | O_RDWR Read and write | |
394 | O_CREAT Create the file if it doesn't exist | |
395 | O_EXCL Fail if the file already exists | |
396 | O_APPEND Append to the file | |
397 | O_TRUNC Truncate the file | |
398 | O_NONBLOCK Non-blocking access | |
399 | ||
ca6e1c26 JH |
400 | Less common flags that are sometimes available on some operating |
401 | systems include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>, | |
402 | C<O_DEFER>, C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>, | |
403 | C<O_NOCTTY>, C<O_NDELAY> and C<O_LARGEFILE>. Consult your open(2) | |
404 | manpage or its local equivalent for details. (Note: starting from | |
1a193132 | 405 | Perl release 5.6 the C<O_LARGEFILE> flag, if available, is automatically |
106325ad | 406 | added to the sysopen() flags because large files are the default.) |
f8284313 TC |
407 | |
408 | Here's how to use C<sysopen> to emulate the simple C<open> calls we had | |
409 | before. We'll omit the C<|| die $!> checks for clarity, but make sure | |
410 | you always check the return values in real code. These aren't quite | |
6b0ac556 | 411 | the same, since C<open> will trim leading and trailing whitespace, |
1a193132 | 412 | but you'll get the idea. |
f8284313 TC |
413 | |
414 | To open a file for reading: | |
415 | ||
416 | open(FH, "< $path"); | |
417 | sysopen(FH, $path, O_RDONLY); | |
418 | ||
419 | To open a file for writing, creating a new file if needed or else truncating | |
420 | an old file: | |
421 | ||
422 | open(FH, "> $path"); | |
423 | sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT); | |
424 | ||
425 | To open a file for appending, creating one if necessary: | |
426 | ||
427 | open(FH, ">> $path"); | |
428 | sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT); | |
429 | ||
430 | To open a file for update, where the file must already exist: | |
431 | ||
432 | open(FH, "+< $path"); | |
433 | sysopen(FH, $path, O_RDWR); | |
434 | ||
435 | And here are things you can do with C<sysopen> that you cannot do with | |
1a193132 | 436 | a regular C<open>. As you'll see, it's just a matter of controlling the |
f8284313 TC |
437 | flags in the third argument. |
438 | ||
439 | To open a file for writing, creating a new file which must not previously | |
440 | exist: | |
441 | ||
442 | sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT); | |
443 | ||
444 | To open a file for appending, where that file must already exist: | |
445 | ||
446 | sysopen(FH, $path, O_WRONLY | O_APPEND); | |
447 | ||
448 | To open a file for update, creating a new file if necessary: | |
449 | ||
450 | sysopen(FH, $path, O_RDWR | O_CREAT); | |
451 | ||
452 | To open a file for update, where that file must not already exist: | |
453 | ||
454 | sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT); | |
455 | ||
456 | To open a file without blocking, creating one if necessary: | |
457 | ||
458 | sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT); | |
459 | ||
460 | =head2 Permissions E<agrave> la mode | |
461 | ||
462 | If you omit the MASK argument to C<sysopen>, Perl uses the octal value | |
463 | 0666. The normal MASK to use for executables and directories should | |
464 | be 0777, and for anything else, 0666. | |
465 | ||
466 | Why so permissive? Well, it isn't really. The MASK will be modified | |
467 | by your process's current C<umask>. A umask is a number representing | |
468 | I<disabled> permissions bits; that is, bits that will not be turned on | |
e1020413 | 469 | in the created file's permissions field. |
f8284313 TC |
470 | |
471 | For example, if your C<umask> were 027, then the 020 part would | |
472 | disable the group from writing, and the 007 part would disable others | |
473 | from reading, writing, or executing. Under these conditions, passing | |
1a193132 | 474 | C<sysopen> 0666 would create a file with mode 0640, since C<0666 & ~027> |
f8284313 TC |
475 | is 0640. |
476 | ||
477 | You should seldom use the MASK argument to C<sysopen()>. That takes | |
478 | away the user's freedom to choose what permission new files will have. | |
479 | Denying choice is almost always a bad thing. One exception would be for | |
480 | cases where sensitive or private data is being stored, such as with mail | |
481 | folders, cookie files, and internal temporary files. | |
482 | ||
483 | =head1 Obscure Open Tricks | |
484 | ||
485 | =head2 Re-Opening Files (dups) | |
486 | ||
487 | Sometimes you already have a filehandle open, and want to make another | |
488 | handle that's a duplicate of the first one. In the shell, we place an | |
489 | ampersand in front of a file descriptor number when doing redirections. | |
c47ff5f1 | 490 | For example, C<< 2>&1 >> makes descriptor 2 (that's STDERR in Perl) |
f8284313 TC |
491 | be redirected into descriptor 1 (which is usually Perl's STDOUT). |
492 | The same is essentially true in Perl: a filename that begins with an | |
493 | ampersand is treated instead as a file descriptor if a number, or as a | |
494 | filehandle if a string. | |
495 | ||
496 | open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!"; | |
497 | open(MHCONTEXT, "<&4") || die "couldn't dup fd4: $!"; | |
498 | ||
499 | That means that if a function is expecting a filename, but you don't | |
500 | want to give it a filename because you already have the file open, you | |
501 | can just pass the filehandle with a leading ampersand. It's best to | |
502 | use a fully qualified handle though, just in case the function happens | |
503 | to be in a different package: | |
504 | ||
505 | somefunction("&main::LOGFILE"); | |
506 | ||
507 | This way if somefunction() is planning on opening its argument, it can | |
508 | just use the already opened handle. This differs from passing a handle, | |
509 | because with a handle, you don't open the file. Here you have something | |
510 | you can pass to open. | |
511 | ||
512 | If you have one of those tricky, newfangled I/O objects that the C++ | |
513 | folks are raving about, then this doesn't work because those aren't a | |
514 | proper filehandle in the native Perl sense. You'll have to use fileno() | |
515 | to pull out the proper descriptor number, assuming you can: | |
516 | ||
517 | use IO::Socket; | |
518 | $handle = IO::Socket::INET->new("www.perl.com:80"); | |
519 | $fd = $handle->fileno; | |
520 | somefunction("&$fd"); # not an indirect function call | |
521 | ||
522 | It can be easier (and certainly will be faster) just to use real | |
523 | filehandles though: | |
524 | ||
525 | use IO::Socket; | |
526 | local *REMOTE = IO::Socket::INET->new("www.perl.com:80"); | |
527 | die "can't connect" unless defined(fileno(REMOTE)); | |
528 | somefunction("&main::REMOTE"); | |
529 | ||
530 | If the filehandle or descriptor number is preceded not just with a simple | |
531 | "&" but rather with a "&=" combination, then Perl will not create a | |
532 | completely new descriptor opened to the same place using the dup(2) | |
533 | system call. Instead, it will just make something of an alias to the | |
1b9762da | 534 | existing one using the fdopen(3S) library call. This is slightly more |
f8284313 TC |
535 | parsimonious of systems resources, although this is less a concern |
536 | these days. Here's an example of that: | |
537 | ||
538 | $fd = $ENV{"MHCONTEXTFD"}; | |
539 | open(MHCONTEXT, "<&=$fd") or die "couldn't fdopen $fd: $!"; | |
540 | ||
c47ff5f1 GS |
541 | If you're using magic C<< <ARGV> >>, you could even pass in as a |
542 | command line argument in @ARGV something like C<"<&=$MHCONTEXTFD">, | |
f8284313 TC |
543 | but we've never seen anyone actually do this. |
544 | ||
545 | =head2 Dispelling the Dweomer | |
546 | ||
547 | Perl is more of a DWIMmer language than something like Java--where DWIM | |
548 | is an acronym for "do what I mean". But this principle sometimes leads | |
549 | to more hidden magic than one knows what to do with. In this way, Perl | |
550 | is also filled with I<dweomer>, an obscure word meaning an enchantment. | |
551 | Sometimes, Perl's DWIMmer is just too much like dweomer for comfort. | |
552 | ||
553 | If magic C<open> is a bit too magical for you, you don't have to turn | |
554 | to C<sysopen>. To open a file with arbitrary weird characters in | |
555 | it, it's necessary to protect any leading and trailing whitespace. | |
556 | Leading whitespace is protected by inserting a C<"./"> in front of a | |
557 | filename that starts with whitespace. Trailing whitespace is protected | |
1a193132 | 558 | by appending an ASCII NUL byte (C<"\0">) at the end of the string. |
f8284313 TC |
559 | |
560 | $file =~ s#^(\s)#./$1#; | |
561 | open(FH, "< $file\0") || die "can't open $file: $!"; | |
562 | ||
563 | This assumes, of course, that your system considers dot the current | |
564 | working directory, slash the directory separator, and disallows ASCII | |
565 | NULs within a valid filename. Most systems follow these conventions, | |
566 | including all POSIX systems as well as proprietary Microsoft systems. | |
567 | The only vaguely popular system that doesn't work this way is the | |
8e30f651 | 568 | "Classic" Macintosh system, which uses a colon where the rest of us |
f8284313 TC |
569 | use a slash. Maybe C<sysopen> isn't such a bad idea after all. |
570 | ||
c47ff5f1 | 571 | If you want to use C<< <ARGV> >> processing in a totally boring |
f8284313 TC |
572 | and non-magical way, you could do this first: |
573 | ||
574 | # "Sam sat on the ground and put his head in his hands. | |
575 | # 'I wish I had never come here, and I don't want to see | |
576 | # no more magic,' he said, and fell silent." | |
577 | for (@ARGV) { | |
578 | s#^([^./])#./$1#; | |
579 | $_ .= "\0"; | |
580 | } | |
581 | while (<>) { | |
582 | # now process $_ | |
583 | } | |
584 | ||
585 | But be warned that users will not appreciate being unable to use "-" | |
586 | to mean standard input, per the standard convention. | |
587 | ||
588 | =head2 Paths as Opens | |
589 | ||
590 | You've probably noticed how Perl's C<warn> and C<die> functions can | |
591 | produce messages like: | |
592 | ||
1761cee5 | 593 | Some warning at scriptname line 29, <FH> line 7. |
f8284313 TC |
594 | |
595 | That's because you opened a filehandle FH, and had read in seven records | |
1a193132 | 596 | from it. But what was the name of the file, rather than the handle? |
f8284313 | 597 | |
1a193132 | 598 | If you aren't running with C<strict refs>, or if you've turned them off |
f8284313 TC |
599 | temporarily, then all you have to do is this: |
600 | ||
601 | open($path, "< $path") || die "can't open $path: $!"; | |
602 | while (<$path>) { | |
603 | # whatever | |
604 | } | |
605 | ||
606 | Since you're using the pathname of the file as its handle, | |
607 | you'll get warnings more like | |
608 | ||
1761cee5 | 609 | Some warning at scriptname line 29, </etc/motd> line 7. |
f8284313 TC |
610 | |
611 | =head2 Single Argument Open | |
612 | ||
613 | Remember how we said that Perl's open took two arguments? That was a | |
614 | passive prevarication. You see, it can also take just one argument. | |
615 | If and only if the variable is a global variable, not a lexical, you | |
616 | can pass C<open> just one argument, the filehandle, and it will | |
617 | get the path from the global scalar variable of the same name. | |
618 | ||
619 | $FILE = "/etc/motd"; | |
620 | open FILE or die "can't open $FILE: $!"; | |
621 | while (<FILE>) { | |
622 | # whatever | |
623 | } | |
624 | ||
625 | Why is this here? Someone has to cater to the hysterical porpoises. | |
626 | It's something that's been in Perl since the very beginning, if not | |
627 | before. | |
628 | ||
629 | =head2 Playing with STDIN and STDOUT | |
630 | ||
631 | One clever move with STDOUT is to explicitly close it when you're done | |
632 | with the program. | |
633 | ||
634 | END { close(STDOUT) || die "can't close stdout: $!" } | |
635 | ||
636 | If you don't do this, and your program fills up the disk partition due | |
637 | to a command line redirection, it won't report the error exit with a | |
638 | failure status. | |
639 | ||
640 | You don't have to accept the STDIN and STDOUT you were given. You are | |
641 | welcome to reopen them if you'd like. | |
642 | ||
643 | open(STDIN, "< datafile") | |
644 | || die "can't open datafile: $!"; | |
645 | ||
646 | open(STDOUT, "> output") | |
647 | || die "can't open output: $!"; | |
648 | ||
00dcde61 | 649 | And then these can be accessed directly or passed on to subprocesses. |
f8284313 TC |
650 | This makes it look as though the program were initially invoked |
651 | with those redirections from the command line. | |
652 | ||
653 | It's probably more interesting to connect these to pipes. For example: | |
654 | ||
655 | $pager = $ENV{PAGER} || "(less || more)"; | |
656 | open(STDOUT, "| $pager") | |
657 | || die "can't fork a pager: $!"; | |
658 | ||
659 | This makes it appear as though your program were called with its stdout | |
660 | already piped into your pager. You can also use this kind of thing | |
661 | in conjunction with an implicit fork to yourself. You might do this | |
662 | if you would rather handle the post processing in your own program, | |
663 | just in a different process: | |
664 | ||
665 | head(100); | |
666 | while (<>) { | |
667 | print; | |
668 | } | |
669 | ||
670 | sub head { | |
671 | my $lines = shift || 20; | |
1eb83ea0 | 672 | return if $pid = open(STDOUT, "|-"); # return if parent |
f8284313 TC |
673 | die "cannot fork: $!" unless defined $pid; |
674 | while (<STDIN>) { | |
f8284313 | 675 | last if --$lines < 0; |
1eb83ea0 | 676 | print; |
f8284313 TC |
677 | } |
678 | exit; | |
679 | } | |
680 | ||
681 | This technique can be applied to repeatedly push as many filters on your | |
682 | output stream as you wish. | |
683 | ||
684 | =head1 Other I/O Issues | |
685 | ||
686 | These topics aren't really arguments related to C<open> or C<sysopen>, | |
687 | but they do affect what you do with your open files. | |
688 | ||
689 | =head2 Opening Non-File Files | |
690 | ||
691 | When is a file not a file? Well, you could say when it exists but | |
692 | isn't a plain file. We'll check whether it's a symbolic link first, | |
693 | just in case. | |
694 | ||
695 | if (-l $file || ! -f _) { | |
696 | print "$file is not a plain file\n"; | |
697 | } | |
698 | ||
699 | What other kinds of files are there than, well, files? Directories, | |
700 | symbolic links, named pipes, Unix-domain sockets, and block and character | |
701 | devices. Those are all files, too--just not I<plain> files. This isn't | |
702 | the same issue as being a text file. Not all text files are plain files. | |
1a193132 | 703 | Not all plain files are text files. That's why there are separate C<-f> |
f8284313 TC |
704 | and C<-T> file tests. |
705 | ||
706 | To open a directory, you should use the C<opendir> function, then | |
707 | process it with C<readdir>, carefully restoring the directory | |
708 | name if necessary: | |
709 | ||
710 | opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; | |
711 | while (defined($file = readdir(DIR))) { | |
712 | # do something with "$dirname/$file" | |
713 | } | |
714 | closedir(DIR); | |
715 | ||
716 | If you want to process directories recursively, it's better to use the | |
1a193132 AL |
717 | File::Find module. For example, this prints out all files recursively |
718 | and adds a slash to their names if the file is a directory. | |
f8284313 TC |
719 | |
720 | @ARGV = qw(.) unless @ARGV; | |
721 | use File::Find; | |
722 | find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV; | |
723 | ||
724 | This finds all bogus symbolic links beneath a particular directory: | |
725 | ||
726 | find sub { print "$File::Find::name\n" if -l && !-e }, $dir; | |
727 | ||
728 | As you see, with symbolic links, you can just pretend that it is | |
729 | what it points to. Or, if you want to know I<what> it points to, then | |
730 | C<readlink> is called for: | |
731 | ||
732 | if (-l $file) { | |
733 | if (defined($whither = readlink($file))) { | |
734 | print "$file points to $whither\n"; | |
735 | } else { | |
736 | print "$file points nowhere: $!\n"; | |
737 | } | |
738 | } | |
739 | ||
1a193132 AL |
740 | =head2 Opening Named Pipes |
741 | ||
f8284313 TC |
742 | Named pipes are a different matter. You pretend they're regular files, |
743 | but their opens will normally block until there is both a reader and | |
744 | a writer. You can read more about them in L<perlipc/"Named Pipes">. | |
745 | Unix-domain sockets are rather different beasts as well; they're | |
746 | described in L<perlipc/"Unix-Domain TCP Clients and Servers">. | |
747 | ||
1a193132 | 748 | When it comes to opening devices, it can be easy and it can be tricky. |
f8284313 TC |
749 | We'll assume that if you're opening up a block device, you know what |
750 | you're doing. The character devices are more interesting. These are | |
751 | typically used for modems, mice, and some kinds of printers. This is | |
752 | described in L<perlfaq8/"How do I read and write the serial port?"> | |
753 | It's often enough to open them carefully: | |
754 | ||
755 | sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY) | |
756 | # (O_NOCTTY no longer needed on POSIX systems) | |
757 | or die "can't open /dev/ttyS1: $!"; | |
758 | open(TTYOUT, "+>&TTYIN") | |
759 | or die "can't dup TTYIN: $!"; | |
760 | ||
761 | $ofh = select(TTYOUT); $| = 1; select($ofh); | |
762 | ||
763 | print TTYOUT "+++at\015"; | |
764 | $answer = <TTYIN>; | |
765 | ||
1a193132 AL |
766 | With descriptors that you haven't opened using C<sysopen>, such as |
767 | sockets, you can set them to be non-blocking using C<fcntl>: | |
f8284313 TC |
768 | |
769 | use Fcntl; | |
21d1ba01 RGS |
770 | my $old_flags = fcntl($handle, F_GETFL, 0) |
771 | or die "can't get flags: $!"; | |
772 | fcntl($handle, F_SETFL, $old_flags | O_NONBLOCK) | |
f8284313 TC |
773 | or die "can't set non blocking: $!"; |
774 | ||
775 | Rather than losing yourself in a morass of twisting, turning C<ioctl>s, | |
776 | all dissimilar, if you're going to manipulate ttys, it's best to | |
777 | make calls out to the stty(1) program if you have it, or else use the | |
778 | portable POSIX interface. To figure this all out, you'll need to read the | |
779 | termios(3) manpage, which describes the POSIX interface to tty devices, | |
780 | and then L<POSIX>, which describes Perl's interface to POSIX. There are | |
781 | also some high-level modules on CPAN that can help you with these games. | |
782 | Check out Term::ReadKey and Term::ReadLine. | |
783 | ||
1a193132 AL |
784 | =head2 Opening Sockets |
785 | ||
f8284313 | 786 | What else can you open? To open a connection using sockets, you won't use |
13a2d996 SP |
787 | one of Perl's two open functions. See |
788 | L<perlipc/"Sockets: Client/Server Communication"> for that. Here's an | |
789 | example. Once you have it, you can use FH as a bidirectional filehandle. | |
f8284313 TC |
790 | |
791 | use IO::Socket; | |
792 | local *FH = IO::Socket::INET->new("www.perl.com:80"); | |
793 | ||
794 | For opening up a URL, the LWP modules from CPAN are just what | |
795 | the doctor ordered. There's no filehandle interface, but | |
796 | it's still easy to get the contents of a document: | |
797 | ||
798 | use LWP::Simple; | |
46c3340e | 799 | $doc = get('http://www.cpan.org/'); |
f8284313 TC |
800 | |
801 | =head2 Binary Files | |
802 | ||
803 | On certain legacy systems with what could charitably be called terminally | |
804 | convoluted (some would say broken) I/O models, a file isn't a file--at | |
805 | least, not with respect to the C standard I/O library. On these old | |
806 | systems whose libraries (but not kernels) distinguish between text and | |
807 | binary streams, to get files to behave properly you'll have to bend over | |
808 | backwards to avoid nasty problems. On such infelicitous systems, sockets | |
809 | and pipes are already opened in binary mode, and there is currently no | |
810 | way to turn that off. With files, you have more options. | |
811 | ||
812 | Another option is to use the C<binmode> function on the appropriate | |
813 | handles before doing regular I/O on them: | |
814 | ||
815 | binmode(STDIN); | |
816 | binmode(STDOUT); | |
817 | while (<STDIN>) { print } | |
818 | ||
819 | Passing C<sysopen> a non-standard flag option will also open the file in | |
820 | binary mode on those systems that support it. This is the equivalent of | |
1a193132 | 821 | opening the file normally, then calling C<binmode> on the handle. |
f8284313 TC |
822 | |
823 | sysopen(BINDAT, "records.data", O_RDWR | O_BINARY) | |
824 | || die "can't open records.data: $!"; | |
825 | ||
826 | Now you can use C<read> and C<print> on that handle without worrying | |
1a193132 | 827 | about the non-standard system I/O library breaking your data. It's not |
f8284313 TC |
828 | a pretty picture, but then, legacy systems seldom are. CP/M will be |
829 | with us until the end of days, and after. | |
830 | ||
831 | On systems with exotic I/O systems, it turns out that, astonishingly | |
832 | enough, even unbuffered I/O using C<sysread> and C<syswrite> might do | |
833 | sneaky data mutilation behind your back. | |
834 | ||
835 | while (sysread(WHENCE, $buf, 1024)) { | |
836 | syswrite(WHITHER, $buf, length($buf)); | |
837 | } | |
838 | ||
839 | Depending on the vicissitudes of your runtime system, even these calls | |
840 | may need C<binmode> or C<O_BINARY> first. Systems known to be free of | |
e6f03d26 | 841 | such difficulties include Unix, the Mac OS, Plan 9, and Inferno. |
f8284313 TC |
842 | |
843 | =head2 File Locking | |
844 | ||
845 | In a multitasking environment, you may need to be careful not to collide | |
1a193132 | 846 | with other processes who want to do I/O on the same files as you |
f8284313 TC |
847 | are working on. You'll often need shared or exclusive locks |
848 | on files for reading and writing respectively. You might just | |
849 | pretend that only exclusive locks exist. | |
850 | ||
851 | Never use the existence of a file C<-e $file> as a locking indication, | |
852 | because there is a race condition between the test for the existence of | |
1a193132 AL |
853 | the file and its creation. It's possible for another process to create |
854 | a file in the slice of time between your existence check and your attempt | |
855 | to create the file. Atomicity is critical. | |
f8284313 TC |
856 | |
857 | Perl's most portable locking interface is via the C<flock> function, | |
1a193132 AL |
858 | whose simplicity is emulated on systems that don't directly support it |
859 | such as SysV or Windows. The underlying semantics may affect how | |
f8284313 TC |
860 | it all works, so you should learn how C<flock> is implemented on your |
861 | system's port of Perl. | |
862 | ||
863 | File locking I<does not> lock out another process that would like to | |
864 | do I/O. A file lock only locks out others trying to get a lock, not | |
865 | processes trying to do I/O. Because locks are advisory, if one process | |
866 | uses locking and another doesn't, all bets are off. | |
867 | ||
868 | By default, the C<flock> call will block until a lock is granted. | |
869 | A request for a shared lock will be granted as soon as there is no | |
d1be9408 | 870 | exclusive locker. A request for an exclusive lock will be granted as |
f8284313 TC |
871 | soon as there is no locker of any kind. Locks are on file descriptors, |
872 | not file names. You can't lock a file until you open it, and you can't | |
873 | hold on to a lock once the file has been closed. | |
874 | ||
875 | Here's how to get a blocking shared lock on a file, typically used | |
876 | for reading: | |
877 | ||
878 | use 5.004; | |
879 | use Fcntl qw(:DEFAULT :flock); | |
880 | open(FH, "< filename") or die "can't open filename: $!"; | |
881 | flock(FH, LOCK_SH) or die "can't lock filename: $!"; | |
882 | # now read from FH | |
883 | ||
884 | You can get a non-blocking lock by using C<LOCK_NB>. | |
885 | ||
886 | flock(FH, LOCK_SH | LOCK_NB) | |
887 | or die "can't lock filename: $!"; | |
888 | ||
889 | This can be useful for producing more user-friendly behaviour by warning | |
890 | if you're going to be blocking: | |
891 | ||
892 | use 5.004; | |
893 | use Fcntl qw(:DEFAULT :flock); | |
894 | open(FH, "< filename") or die "can't open filename: $!"; | |
895 | unless (flock(FH, LOCK_SH | LOCK_NB)) { | |
896 | $| = 1; | |
897 | print "Waiting for lock..."; | |
898 | flock(FH, LOCK_SH) or die "can't lock filename: $!"; | |
899 | print "got it.\n" | |
900 | } | |
901 | # now read from FH | |
902 | ||
903 | To get an exclusive lock, typically used for writing, you have to be | |
904 | careful. We C<sysopen> the file so it can be locked before it gets | |
905 | emptied. You can get a nonblocking version using C<LOCK_EX | LOCK_NB>. | |
906 | ||
907 | use 5.004; | |
908 | use Fcntl qw(:DEFAULT :flock); | |
909 | sysopen(FH, "filename", O_WRONLY | O_CREAT) | |
910 | or die "can't open filename: $!"; | |
911 | flock(FH, LOCK_EX) | |
912 | or die "can't lock filename: $!"; | |
913 | truncate(FH, 0) | |
914 | or die "can't truncate filename: $!"; | |
915 | # now write to FH | |
916 | ||
917 | Finally, due to the uncounted millions who cannot be dissuaded from | |
918 | wasting cycles on useless vanity devices called hit counters, here's | |
919 | how to increment a number in a file safely: | |
920 | ||
921 | use Fcntl qw(:DEFAULT :flock); | |
922 | ||
923 | sysopen(FH, "numfile", O_RDWR | O_CREAT) | |
924 | or die "can't open numfile: $!"; | |
925 | # autoflush FH | |
926 | $ofh = select(FH); $| = 1; select ($ofh); | |
927 | flock(FH, LOCK_EX) | |
928 | or die "can't write-lock numfile: $!"; | |
929 | ||
930 | $num = <FH> || 0; | |
931 | seek(FH, 0, 0) | |
932 | or die "can't rewind numfile : $!"; | |
933 | print FH $num+1, "\n" | |
934 | or die "can't write numfile: $!"; | |
935 | ||
936 | truncate(FH, tell(FH)) | |
937 | or die "can't truncate numfile: $!"; | |
938 | close(FH) | |
939 | or die "can't close numfile: $!"; | |
940 | ||
ae258fbb JH |
941 | =head2 IO Layers |
942 | ||
943 | In Perl 5.8.0 a new I/O framework called "PerlIO" was introduced. | |
944 | This is a new "plumbing" for all the I/O happening in Perl; for the | |
1a193132 AL |
945 | most part everything will work just as it did, but PerlIO also brought |
946 | in some new features such as the ability to think of I/O as "layers". | |
ae258fbb JH |
947 | One I/O layer may in addition to just moving the data also do |
948 | transformations on the data. Such transformations may include | |
949 | compression and decompression, encryption and decryption, and transforming | |
950 | between various character encodings. | |
951 | ||
952 | Full discussion about the features of PerlIO is out of scope for this | |
953 | tutorial, but here is how to recognize the layers being used: | |
954 | ||
955 | =over 4 | |
956 | ||
957 | =item * | |
958 | ||
1a193132 | 959 | The three-(or more)-argument form of C<open> is being used and the |
ae258fbb JH |
960 | second argument contains something else in addition to the usual |
961 | C<< '<' >>, C<< '>' >>, C<< '>>' >>, C<< '|' >> and their variants, | |
962 | for example: | |
963 | ||
740d4bb2 | 964 | open(my $fh, "<:crlf", $fn); |
ae258fbb JH |
965 | |
966 | =item * | |
967 | ||
1a193132 | 968 | The two-argument form of C<binmode> is being used, for example |
ae258fbb JH |
969 | |
970 | binmode($fh, ":encoding(utf16)"); | |
971 | ||
972 | =back | |
973 | ||
80fea0d2 | 974 | For more detailed discussion about PerlIO see L<PerlIO>; |
ae258fbb JH |
975 | for more detailed discussion about Unicode and I/O see L<perluniintro>. |
976 | ||
f8284313 TC |
977 | =head1 SEE ALSO |
978 | ||
1a193132 AL |
979 | The C<open> and C<sysopen> functions in perlfunc(1); |
980 | the system open(2), dup(2), fopen(3), and fdopen(3) manpages; | |
f8284313 TC |
981 | the POSIX documentation. |
982 | ||
983 | =head1 AUTHOR and COPYRIGHT | |
984 | ||
985 | Copyright 1998 Tom Christiansen. | |
986 | ||
5a7beb56 JH |
987 | This documentation is free; you can redistribute it and/or modify it |
988 | under the same terms as Perl itself. | |
f8284313 TC |
989 | |
990 | Irrespective of its distribution, all code examples in these files are | |
991 | hereby placed into the public domain. You are permitted and | |
992 | encouraged to use this code in your own programs for fun or for profit | |
993 | as you see fit. A simple comment in the code giving credit would be | |
994 | courteous but is not required. | |
995 | ||
996 | =head1 HISTORY | |
997 | ||
998 | First release: Sat Jan 9 08:09:11 MST 1999 |