| 1 | =head1 NAME |
| 2 | |
| 3 | perlopentut - tutorial on opening things in Perl |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | Perl has two simple, built-in ways to open files: the shell way for |
| 8 | convenience, and the C way for precision. The choice is yours. |
| 9 | |
| 10 | =head1 Open E<agrave> la shell |
| 11 | |
| 12 | Perl's C<open> function was designed to mimic the way command-line |
| 13 | redirection in the shell works. Here are some basic examples |
| 14 | from the shell: |
| 15 | |
| 16 | $ myprogram file1 file2 file3 |
| 17 | $ myprogram < inputfile |
| 18 | $ myprogram > outputfile |
| 19 | $ myprogram >> outputfile |
| 20 | $ myprogram | otherprogram |
| 21 | $ otherprogram | myprogram |
| 22 | |
| 23 | And here are some more advanced examples: |
| 24 | |
| 25 | $ otherprogram | myprogram f1 - f2 |
| 26 | $ otherprogram 2>&1 | myprogram - |
| 27 | $ myprogram <&3 |
| 28 | $ myprogram >&4 |
| 29 | |
| 30 | Programmers accustomed to constructs like those above can take comfort |
| 31 | in learning that Perl directly supports these familiar constructs using |
| 32 | virtually the same syntax as the shell. |
| 33 | |
| 34 | =head2 Simple Opens |
| 35 | |
| 36 | The C<open> function takes two arguments: the first is a filehandle, |
| 37 | and the second is a single string comprising both what to open and how |
| 38 | to open it. C<open> returns true when it works, and when it fails, |
| 39 | returns a false value and sets the special variable $! to reflect |
| 40 | the system error. If the filehandle was previously opened, it will |
| 41 | be implicitly closed first. |
| 42 | |
| 43 | For example: |
| 44 | |
| 45 | open(INFO, "datafile") || die("can't open datafile: $!"); |
| 46 | open(INFO, "< datafile") || die("can't open datafile: $!"); |
| 47 | open(RESULTS,"> runstats") || die("can't open runstats: $!"); |
| 48 | open(LOG, ">> logfile ") || die("can't open logfile: $!"); |
| 49 | |
| 50 | If you prefer the low-punctuation version, you could write that this way: |
| 51 | |
| 52 | open INFO, "< datafile" or die "can't open datafile: $!"; |
| 53 | open RESULTS,"> runstats" or die "can't open runstats: $!"; |
| 54 | open LOG, ">> logfile " or die "can't open logfile: $!"; |
| 55 | |
| 56 | A few things to notice. First, the leading less-than is optional. |
| 57 | If omitted, Perl assumes that you want to open the file for reading. |
| 58 | |
| 59 | The other important thing to notice is that, just as in the shell, |
| 60 | any white space before or after the filename is ignored. This is good, |
| 61 | because you wouldn't want these to do different things: |
| 62 | |
| 63 | open INFO, "<datafile" |
| 64 | open INFO, "< datafile" |
| 65 | open INFO, "< datafile" |
| 66 | |
| 67 | Ignoring surround whitespace also helps for when you read a filename in |
| 68 | from a different file, and forget to trim it before opening: |
| 69 | |
| 70 | $filename = <INFO>; # oops, \n still there |
| 71 | open(EXTRA, "< $filename") || die "can't open $filename: $!"; |
| 72 | |
| 73 | This is not a bug, but a feature. Because C<open> mimics the shell in |
| 74 | its style of using redirection arrows to specify how to open the file, it |
| 75 | also does so with respect to extra white space around the filename itself |
| 76 | as well. For accessing files with naughty names, see L<"Dispelling |
| 77 | the Dweomer">. |
| 78 | |
| 79 | =head2 Pipe Opens |
| 80 | |
| 81 | In C, when you want to open a file using the standard I/O library, |
| 82 | you use the C<fopen> function, but when opening a pipe, you use the |
| 83 | C<popen> function. But in the shell, you just use a different redirection |
| 84 | character. That's also the case for Perl. The C<open> call |
| 85 | remains the same--just its argument differs. |
| 86 | |
| 87 | If the leading character is a pipe symbol, C<open> starts up a new |
| 88 | command and open a write-only filehandle leading into that command. |
| 89 | This lets you write into that handle and have what you write show up on |
| 90 | that command's standard input. For example: |
| 91 | |
| 92 | open(PRINTER, "| lpr -Plp1") || die "cannot fork: $!"; |
| 93 | print PRINTER "stuff\n"; |
| 94 | close(PRINTER) || die "can't close lpr: $!"; |
| 95 | |
| 96 | If the trailing character is a pipe, you start up a new command and open a |
| 97 | read-only filehandle leading out of that command. This lets whatever that |
| 98 | command writes to its standard output show up on your handle for reading. |
| 99 | For example: |
| 100 | |
| 101 | open(NET, "netstat -i -n |") || die "cannot fork: $!"; |
| 102 | while (<NET>) { } # do something with input |
| 103 | close(NET) || die "can't close netstat: $!"; |
| 104 | |
| 105 | What happens if you try to open a pipe to or from a non-existent command? |
| 106 | In most systems, such an C<open> will not return an error. That's |
| 107 | because in the traditional C<fork>/C<exec> model, running the other |
| 108 | program happens only in the forked child process, which means that |
| 109 | the failed C<exec> can't be reflected in the return value of C<open>. |
| 110 | Only a failed C<fork> shows up there. See L<perlfaq8/"Why doesn't open() |
| 111 | return an error when a pipe open fails?"> to see how to cope with this. |
| 112 | There's also an explanation in L<perlipc>. |
| 113 | |
| 114 | If you would like to open a bidirectional pipe, the IPC::Open2 |
| 115 | library will handle this for you. Check out L<perlipc/"Bidirectional |
| 116 | Communication with Another Process"> |
| 117 | |
| 118 | =head2 The Minus File |
| 119 | |
| 120 | Again following the lead of the standard shell utilities, Perl's |
| 121 | C<open> function treats a file whose name is a single minus, "-", in a |
| 122 | special way. If you open minus for reading, it really means to access |
| 123 | the standard input. If you open minus for writing, it really means to |
| 124 | access the standard output. |
| 125 | |
| 126 | If minus can be used as the default input or default output, what happens |
| 127 | if you open a pipe into or out of minus? What's the default command it |
| 128 | would run? The same script as you're currently running! This is actually |
| 129 | a stealth C<fork> hidden inside an C<open> call. See L<perlipc/"Safe Pipe |
| 130 | Opens"> for details. |
| 131 | |
| 132 | =head2 Mixing Reads and Writes |
| 133 | |
| 134 | It is possible to specify both read and write access. All you do is |
| 135 | add a "+" symbol in front of the redirection. But as in the shell, |
| 136 | using a less-than on a file never creates a new file; it only opens an |
| 137 | existing one. On the other hand, using a greater-than always clobbers |
| 138 | (truncates to zero length) an existing file, or creates a brand-new one |
| 139 | if there isn't an old one. Adding a "+" for read-write doesn't affect |
| 140 | whether it only works on existing files or always clobbers existing ones. |
| 141 | |
| 142 | open(WTMP, "+< /usr/adm/wtmp") |
| 143 | || die "can't open /usr/adm/wtmp: $!"; |
| 144 | |
| 145 | open(SCREEN, "+> /tmp/lkscreen") |
| 146 | || die "can't open /tmp/lkscreen: $!"; |
| 147 | |
| 148 | open(LOGFILE, "+>> /tmp/applog" |
| 149 | || die "can't open /tmp/applog: $!"; |
| 150 | |
| 151 | The first one won't create a new file, and the second one will always |
| 152 | clobber an old one. The third one will create a new file if necessary |
| 153 | and not clobber an old one, and it will allow you to read at any point |
| 154 | in the file, but all writes will always go to the end. In short, |
| 155 | the first case is substantially more common than the second and third |
| 156 | cases, which are almost always wrong. (If you know C, the plus in |
| 157 | Perl's C<open> is historically derived from the one in C's fopen(3S), |
| 158 | which it ultimately calls.) |
| 159 | |
| 160 | In fact, when it comes to updating a file, unless you're working on |
| 161 | a binary file as in the WTMP case above, you probably don't want to |
| 162 | use this approach for updating. Instead, Perl's B<-i> flag comes to |
| 163 | the rescue. The following command takes all the C, C++, or yacc source |
| 164 | or header files and changes all their foo's to bar's, leaving |
| 165 | the old version in the original file name with a ".orig" tacked |
| 166 | on the end: |
| 167 | |
| 168 | $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy] |
| 169 | |
| 170 | This is a short cut for some renaming games that are really |
| 171 | the best way to update textfiles. See the second question in |
| 172 | L<perlfaq5> for more details. |
| 173 | |
| 174 | =head2 Filters |
| 175 | |
| 176 | One of the most common uses for C<open> is one you never |
| 177 | even notice. When you process the ARGV filehandle using |
| 178 | C<E<lt>ARGVE<gt>>, Perl actually does an implicit open |
| 179 | on each file in @ARGV. Thus a program called like this: |
| 180 | |
| 181 | $ myprogram file1 file2 file3 |
| 182 | |
| 183 | Can have all its files opened and processed one at a time |
| 184 | using a construct no more complex than: |
| 185 | |
| 186 | while (<>) { |
| 187 | # do something with $_ |
| 188 | } |
| 189 | |
| 190 | If @ARGV is empty when the loop first begins, Perl pretends you've opened |
| 191 | up minus, that is, the standard input. In fact, $ARGV, the currently |
| 192 | open file during C<E<lt>ARGVE<gt>> processing, is even set to "-" |
| 193 | in these circumstances. |
| 194 | |
| 195 | You are welcome to pre-process your @ARGV before starting the loop to |
| 196 | make sure it's to your liking. One reason to do this might be to remove |
| 197 | command options beginning with a minus. While you can always roll the |
| 198 | simple ones by hand, the Getopts modules are good for this. |
| 199 | |
| 200 | use Getopt::Std; |
| 201 | |
| 202 | # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o |
| 203 | getopts("vDo:"); |
| 204 | |
| 205 | # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o} |
| 206 | getopts("vDo:", \%args); |
| 207 | |
| 208 | Or the standard Getopt::Long module to permit named arguments: |
| 209 | |
| 210 | use Getopt::Long; |
| 211 | GetOptions( "verbose" => \$verbose, # --verbose |
| 212 | "Debug" => \$debug, # --Debug |
| 213 | "output=s" => \$output ); |
| 214 | # --output=somestring or --output somestring |
| 215 | |
| 216 | Another reason for preprocessing arguments is to make an empty |
| 217 | argument list default to all files: |
| 218 | |
| 219 | @ARGV = glob("*") unless @ARGV; |
| 220 | |
| 221 | You could even filter out all but plain, text files. This is a bit |
| 222 | silent, of course, and you might prefer to mention them on the way. |
| 223 | |
| 224 | @ARGV = grep { -f && -T } @ARGV; |
| 225 | |
| 226 | If you're using the B<-n> or B<-p> command-line options, you |
| 227 | should put changes to @ARGV in a C<BEGIN{}> block. |
| 228 | |
| 229 | Remember that a normal C<open> has special properties, in that it might |
| 230 | call fopen(3S) or it might called popen(3S), depending on what its |
| 231 | argument looks like; that's why it's sometimes called "magic open". |
| 232 | Here's an example: |
| 233 | |
| 234 | $pwdinfo = `domainname` =~ /^(\(none\))?$/ |
| 235 | ? '< /etc/passwd' |
| 236 | : 'ypcat passwd |'; |
| 237 | |
| 238 | open(PWD, $pwdinfo) |
| 239 | or die "can't open $pwdinfo: $!"; |
| 240 | |
| 241 | This sort of thing also comes into play in filter processing. Because |
| 242 | C<E<lt>ARGVE<gt>> processing employs the normal, shell-style Perl C<open>, |
| 243 | it respects all the special things we've already seen: |
| 244 | |
| 245 | $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile |
| 246 | |
| 247 | That program will read from the file F<f1>, the process F<cmd1>, standard |
| 248 | input (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command, |
| 249 | and finally the F<f3> file. |
| 250 | |
| 251 | Yes, this also means that if you have a file named "-" (and so on) in |
| 252 | your directory, that they won't be processed as literal files by C<open>. |
| 253 | You'll need to pass them as "./-" much as you would for the I<rm> program. |
| 254 | Or you could use C<sysopen> as described below. |
| 255 | |
| 256 | One of the more interesting applications is to change files of a certain |
| 257 | name into pipes. For example, to autoprocess gzipped or compressed |
| 258 | files by decompressing them with I<gzip>: |
| 259 | |
| 260 | @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV; |
| 261 | |
| 262 | Or, if you have the I<GET> program installed from LWP, |
| 263 | you can fetch URLs before processing them: |
| 264 | |
| 265 | @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV; |
| 266 | |
| 267 | It's not for nothing that this is called magic C<E<lt>ARGVE<gt>>. |
| 268 | Pretty nifty, eh? |
| 269 | |
| 270 | =head1 Open E<agrave> la C |
| 271 | |
| 272 | If you want the convenience of the shell, then Perl's C<open> is |
| 273 | definitely the way to go. On the other hand, if you want finer precision |
| 274 | than C's simplistic fopen(3S) provides, then you should look to Perl's |
| 275 | C<sysopen>, which is a direct hook into the open(2) system call. |
| 276 | That does mean it's a bit more involved, but that's the price of |
| 277 | precision. |
| 278 | |
| 279 | C<sysopen> takes 3 (or 4) arguments. |
| 280 | |
| 281 | sysopen HANDLE, PATH, FLAGS, [MASK] |
| 282 | |
| 283 | The HANDLE argument is a filehandle just as with C<open>. The PATH is |
| 284 | a literal path, one that doesn't pay attention to any greater-thans or |
| 285 | less-thans or pipes or minuses, nor ignore white space. If it's there, |
| 286 | it's part of the path. The FLAGS argument contains one or more values |
| 287 | derived from the Fcntl module that have been or'd together using the |
| 288 | bitwise "|" operator. The final argument, the MASK, is optional; if |
| 289 | present, it is combined with the user's current umask for the creation |
| 290 | mode of the file. You should usually omit this. |
| 291 | |
| 292 | Although the traditional values of read-only, write-only, and read-write |
| 293 | are 0, 1, and 2 respectively, this is known not to hold true on some |
| 294 | systems. Instead, it's best to load in the appropriate constants first |
| 295 | from the Fcntl module, which supplies the following standard flags: |
| 296 | |
| 297 | O_RDONLY Read only |
| 298 | O_WRONLY Write only |
| 299 | O_RDWR Read and write |
| 300 | O_CREAT Create the file if it doesn't exist |
| 301 | O_EXCL Fail if the file already exists |
| 302 | O_APPEND Append to the file |
| 303 | O_TRUNC Truncate the file |
| 304 | O_NONBLOCK Non-blocking access |
| 305 | |
| 306 | Less common flags that are sometimes available on some operating systems |
| 307 | include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>, C<O_DEFER>, |
| 308 | C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>, C<O_NOCTTY>, C<O_NDELAY> |
| 309 | and C<O_LARGEFILE>. Consult your open(2) manpage or its local equivalent |
| 310 | for details. |
| 311 | |
| 312 | Here's how to use C<sysopen> to emulate the simple C<open> calls we had |
| 313 | before. We'll omit the C<|| die $!> checks for clarity, but make sure |
| 314 | you always check the return values in real code. These aren't quite |
| 315 | the same, since C<open> will trim leading and trailing white space, |
| 316 | but you'll get the idea: |
| 317 | |
| 318 | To open a file for reading: |
| 319 | |
| 320 | open(FH, "< $path"); |
| 321 | sysopen(FH, $path, O_RDONLY); |
| 322 | |
| 323 | To open a file for writing, creating a new file if needed or else truncating |
| 324 | an old file: |
| 325 | |
| 326 | open(FH, "> $path"); |
| 327 | sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT); |
| 328 | |
| 329 | To open a file for appending, creating one if necessary: |
| 330 | |
| 331 | open(FH, ">> $path"); |
| 332 | sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT); |
| 333 | |
| 334 | To open a file for update, where the file must already exist: |
| 335 | |
| 336 | open(FH, "+< $path"); |
| 337 | sysopen(FH, $path, O_RDWR); |
| 338 | |
| 339 | And here are things you can do with C<sysopen> that you cannot do with |
| 340 | a regular C<open>. As you see, it's just a matter of controlling the |
| 341 | flags in the third argument. |
| 342 | |
| 343 | To open a file for writing, creating a new file which must not previously |
| 344 | exist: |
| 345 | |
| 346 | sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT); |
| 347 | |
| 348 | To open a file for appending, where that file must already exist: |
| 349 | |
| 350 | sysopen(FH, $path, O_WRONLY | O_APPEND); |
| 351 | |
| 352 | To open a file for update, creating a new file if necessary: |
| 353 | |
| 354 | sysopen(FH, $path, O_RDWR | O_CREAT); |
| 355 | |
| 356 | To open a file for update, where that file must not already exist: |
| 357 | |
| 358 | sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT); |
| 359 | |
| 360 | To open a file without blocking, creating one if necessary: |
| 361 | |
| 362 | sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT); |
| 363 | |
| 364 | =head2 Permissions E<agrave> la mode |
| 365 | |
| 366 | If you omit the MASK argument to C<sysopen>, Perl uses the octal value |
| 367 | 0666. The normal MASK to use for executables and directories should |
| 368 | be 0777, and for anything else, 0666. |
| 369 | |
| 370 | Why so permissive? Well, it isn't really. The MASK will be modified |
| 371 | by your process's current C<umask>. A umask is a number representing |
| 372 | I<disabled> permissions bits; that is, bits that will not be turned on |
| 373 | in the created files' permissions field. |
| 374 | |
| 375 | For example, if your C<umask> were 027, then the 020 part would |
| 376 | disable the group from writing, and the 007 part would disable others |
| 377 | from reading, writing, or executing. Under these conditions, passing |
| 378 | C<sysopen> 0666 would create a file with mode 0640, since C<0666 &~ 027> |
| 379 | is 0640. |
| 380 | |
| 381 | You should seldom use the MASK argument to C<sysopen()>. That takes |
| 382 | away the user's freedom to choose what permission new files will have. |
| 383 | Denying choice is almost always a bad thing. One exception would be for |
| 384 | cases where sensitive or private data is being stored, such as with mail |
| 385 | folders, cookie files, and internal temporary files. |
| 386 | |
| 387 | =head1 Obscure Open Tricks |
| 388 | |
| 389 | =head2 Re-Opening Files (dups) |
| 390 | |
| 391 | Sometimes you already have a filehandle open, and want to make another |
| 392 | handle that's a duplicate of the first one. In the shell, we place an |
| 393 | ampersand in front of a file descriptor number when doing redirections. |
| 394 | For example, C<2E<gt>&1> makes descriptor 2 (that's STDERR in Perl) |
| 395 | be redirected into descriptor 1 (which is usually Perl's STDOUT). |
| 396 | The same is essentially true in Perl: a filename that begins with an |
| 397 | ampersand is treated instead as a file descriptor if a number, or as a |
| 398 | filehandle if a string. |
| 399 | |
| 400 | open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!"; |
| 401 | open(MHCONTEXT, "<&4") || die "couldn't dup fd4: $!"; |
| 402 | |
| 403 | That means that if a function is expecting a filename, but you don't |
| 404 | want to give it a filename because you already have the file open, you |
| 405 | can just pass the filehandle with a leading ampersand. It's best to |
| 406 | use a fully qualified handle though, just in case the function happens |
| 407 | to be in a different package: |
| 408 | |
| 409 | somefunction("&main::LOGFILE"); |
| 410 | |
| 411 | This way if somefunction() is planning on opening its argument, it can |
| 412 | just use the already opened handle. This differs from passing a handle, |
| 413 | because with a handle, you don't open the file. Here you have something |
| 414 | you can pass to open. |
| 415 | |
| 416 | If you have one of those tricky, newfangled I/O objects that the C++ |
| 417 | folks are raving about, then this doesn't work because those aren't a |
| 418 | proper filehandle in the native Perl sense. You'll have to use fileno() |
| 419 | to pull out the proper descriptor number, assuming you can: |
| 420 | |
| 421 | use IO::Socket; |
| 422 | $handle = IO::Socket::INET->new("www.perl.com:80"); |
| 423 | $fd = $handle->fileno; |
| 424 | somefunction("&$fd"); # not an indirect function call |
| 425 | |
| 426 | It can be easier (and certainly will be faster) just to use real |
| 427 | filehandles though: |
| 428 | |
| 429 | use IO::Socket; |
| 430 | local *REMOTE = IO::Socket::INET->new("www.perl.com:80"); |
| 431 | die "can't connect" unless defined(fileno(REMOTE)); |
| 432 | somefunction("&main::REMOTE"); |
| 433 | |
| 434 | If the filehandle or descriptor number is preceded not just with a simple |
| 435 | "&" but rather with a "&=" combination, then Perl will not create a |
| 436 | completely new descriptor opened to the same place using the dup(2) |
| 437 | system call. Instead, it will just make something of an alias to the |
| 438 | existing one using the fdopen(3S) library call This is slightly more |
| 439 | parsimonious of systems resources, although this is less a concern |
| 440 | these days. Here's an example of that: |
| 441 | |
| 442 | $fd = $ENV{"MHCONTEXTFD"}; |
| 443 | open(MHCONTEXT, "<&=$fd") or die "couldn't fdopen $fd: $!"; |
| 444 | |
| 445 | If you're using magic C<E<lt>ARGVE<gt>>, you could even pass in as a |
| 446 | command line argument in @ARGV something like C<"E<lt>&=$MHCONTEXTFD">, |
| 447 | but we've never seen anyone actually do this. |
| 448 | |
| 449 | =head2 Dispelling the Dweomer |
| 450 | |
| 451 | Perl is more of a DWIMmer language than something like Java--where DWIM |
| 452 | is an acronym for "do what I mean". But this principle sometimes leads |
| 453 | to more hidden magic than one knows what to do with. In this way, Perl |
| 454 | is also filled with I<dweomer>, an obscure word meaning an enchantment. |
| 455 | Sometimes, Perl's DWIMmer is just too much like dweomer for comfort. |
| 456 | |
| 457 | If magic C<open> is a bit too magical for you, you don't have to turn |
| 458 | to C<sysopen>. To open a file with arbitrary weird characters in |
| 459 | it, it's necessary to protect any leading and trailing whitespace. |
| 460 | Leading whitespace is protected by inserting a C<"./"> in front of a |
| 461 | filename that starts with whitespace. Trailing whitespace is protected |
| 462 | by appending an ASCII NUL byte (C<"\0">) at the end off the string. |
| 463 | |
| 464 | $file =~ s#^(\s)#./$1#; |
| 465 | open(FH, "< $file\0") || die "can't open $file: $!"; |
| 466 | |
| 467 | This assumes, of course, that your system considers dot the current |
| 468 | working directory, slash the directory separator, and disallows ASCII |
| 469 | NULs within a valid filename. Most systems follow these conventions, |
| 470 | including all POSIX systems as well as proprietary Microsoft systems. |
| 471 | The only vaguely popular system that doesn't work this way is the |
| 472 | proprietary Macintosh system, which uses a colon where the rest of us |
| 473 | use a slash. Maybe C<sysopen> isn't such a bad idea after all. |
| 474 | |
| 475 | If you want to use C<E<lt>ARGVE<gt>> processing in a totally boring |
| 476 | and non-magical way, you could do this first: |
| 477 | |
| 478 | # "Sam sat on the ground and put his head in his hands. |
| 479 | # 'I wish I had never come here, and I don't want to see |
| 480 | # no more magic,' he said, and fell silent." |
| 481 | for (@ARGV) { |
| 482 | s#^([^./])#./$1#; |
| 483 | $_ .= "\0"; |
| 484 | } |
| 485 | while (<>) { |
| 486 | # now process $_ |
| 487 | } |
| 488 | |
| 489 | But be warned that users will not appreciate being unable to use "-" |
| 490 | to mean standard input, per the standard convention. |
| 491 | |
| 492 | =head2 Paths as Opens |
| 493 | |
| 494 | You've probably noticed how Perl's C<warn> and C<die> functions can |
| 495 | produce messages like: |
| 496 | |
| 497 | Some warning at scriptname line 29, <FH> line 7. |
| 498 | |
| 499 | That's because you opened a filehandle FH, and had read in seven records |
| 500 | from it. But what was the name of the file, not the handle? |
| 501 | |
| 502 | If you aren't running with C<strict refs>, or if you've turn them off |
| 503 | temporarily, then all you have to do is this: |
| 504 | |
| 505 | open($path, "< $path") || die "can't open $path: $!"; |
| 506 | while (<$path>) { |
| 507 | # whatever |
| 508 | } |
| 509 | |
| 510 | Since you're using the pathname of the file as its handle, |
| 511 | you'll get warnings more like |
| 512 | |
| 513 | Some warning at scriptname line 29, </etc/motd> line 7. |
| 514 | |
| 515 | =head2 Single Argument Open |
| 516 | |
| 517 | Remember how we said that Perl's open took two arguments? That was a |
| 518 | passive prevarication. You see, it can also take just one argument. |
| 519 | If and only if the variable is a global variable, not a lexical, you |
| 520 | can pass C<open> just one argument, the filehandle, and it will |
| 521 | get the path from the global scalar variable of the same name. |
| 522 | |
| 523 | $FILE = "/etc/motd"; |
| 524 | open FILE or die "can't open $FILE: $!"; |
| 525 | while (<FILE>) { |
| 526 | # whatever |
| 527 | } |
| 528 | |
| 529 | Why is this here? Someone has to cater to the hysterical porpoises. |
| 530 | It's something that's been in Perl since the very beginning, if not |
| 531 | before. |
| 532 | |
| 533 | =head2 Playing with STDIN and STDOUT |
| 534 | |
| 535 | One clever move with STDOUT is to explicitly close it when you're done |
| 536 | with the program. |
| 537 | |
| 538 | END { close(STDOUT) || die "can't close stdout: $!" } |
| 539 | |
| 540 | If you don't do this, and your program fills up the disk partition due |
| 541 | to a command line redirection, it won't report the error exit with a |
| 542 | failure status. |
| 543 | |
| 544 | You don't have to accept the STDIN and STDOUT you were given. You are |
| 545 | welcome to reopen them if you'd like. |
| 546 | |
| 547 | open(STDIN, "< datafile") |
| 548 | || die "can't open datafile: $!"; |
| 549 | |
| 550 | open(STDOUT, "> output") |
| 551 | || die "can't open output: $!"; |
| 552 | |
| 553 | And then these can be read directly or passed on to subprocesses. |
| 554 | This makes it look as though the program were initially invoked |
| 555 | with those redirections from the command line. |
| 556 | |
| 557 | It's probably more interesting to connect these to pipes. For example: |
| 558 | |
| 559 | $pager = $ENV{PAGER} || "(less || more)"; |
| 560 | open(STDOUT, "| $pager") |
| 561 | || die "can't fork a pager: $!"; |
| 562 | |
| 563 | This makes it appear as though your program were called with its stdout |
| 564 | already piped into your pager. You can also use this kind of thing |
| 565 | in conjunction with an implicit fork to yourself. You might do this |
| 566 | if you would rather handle the post processing in your own program, |
| 567 | just in a different process: |
| 568 | |
| 569 | head(100); |
| 570 | while (<>) { |
| 571 | print; |
| 572 | } |
| 573 | |
| 574 | sub head { |
| 575 | my $lines = shift || 20; |
| 576 | return unless $pid = open(STDOUT, "|-"); |
| 577 | die "cannot fork: $!" unless defined $pid; |
| 578 | while (<STDIN>) { |
| 579 | print; |
| 580 | last if --$lines < 0; |
| 581 | } |
| 582 | exit; |
| 583 | } |
| 584 | |
| 585 | This technique can be applied to repeatedly push as many filters on your |
| 586 | output stream as you wish. |
| 587 | |
| 588 | =head1 Other I/O Issues |
| 589 | |
| 590 | These topics aren't really arguments related to C<open> or C<sysopen>, |
| 591 | but they do affect what you do with your open files. |
| 592 | |
| 593 | =head2 Opening Non-File Files |
| 594 | |
| 595 | When is a file not a file? Well, you could say when it exists but |
| 596 | isn't a plain file. We'll check whether it's a symbolic link first, |
| 597 | just in case. |
| 598 | |
| 599 | if (-l $file || ! -f _) { |
| 600 | print "$file is not a plain file\n"; |
| 601 | } |
| 602 | |
| 603 | What other kinds of files are there than, well, files? Directories, |
| 604 | symbolic links, named pipes, Unix-domain sockets, and block and character |
| 605 | devices. Those are all files, too--just not I<plain> files. This isn't |
| 606 | the same issue as being a text file. Not all text files are plain files. |
| 607 | Not all plain files are textfiles. That's why there are separate C<-f> |
| 608 | and C<-T> file tests. |
| 609 | |
| 610 | To open a directory, you should use the C<opendir> function, then |
| 611 | process it with C<readdir>, carefully restoring the directory |
| 612 | name if necessary: |
| 613 | |
| 614 | opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; |
| 615 | while (defined($file = readdir(DIR))) { |
| 616 | # do something with "$dirname/$file" |
| 617 | } |
| 618 | closedir(DIR); |
| 619 | |
| 620 | If you want to process directories recursively, it's better to use the |
| 621 | File::Find module. For example, this prints out all files recursively, |
| 622 | add adds a slash to their names if the file is a directory. |
| 623 | |
| 624 | @ARGV = qw(.) unless @ARGV; |
| 625 | use File::Find; |
| 626 | find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV; |
| 627 | |
| 628 | This finds all bogus symbolic links beneath a particular directory: |
| 629 | |
| 630 | find sub { print "$File::Find::name\n" if -l && !-e }, $dir; |
| 631 | |
| 632 | As you see, with symbolic links, you can just pretend that it is |
| 633 | what it points to. Or, if you want to know I<what> it points to, then |
| 634 | C<readlink> is called for: |
| 635 | |
| 636 | if (-l $file) { |
| 637 | if (defined($whither = readlink($file))) { |
| 638 | print "$file points to $whither\n"; |
| 639 | } else { |
| 640 | print "$file points nowhere: $!\n"; |
| 641 | } |
| 642 | } |
| 643 | |
| 644 | Named pipes are a different matter. You pretend they're regular files, |
| 645 | but their opens will normally block until there is both a reader and |
| 646 | a writer. You can read more about them in L<perlipc/"Named Pipes">. |
| 647 | Unix-domain sockets are rather different beasts as well; they're |
| 648 | described in L<perlipc/"Unix-Domain TCP Clients and Servers">. |
| 649 | |
| 650 | When it comes to opening devices, it can be easy and it can tricky. |
| 651 | We'll assume that if you're opening up a block device, you know what |
| 652 | you're doing. The character devices are more interesting. These are |
| 653 | typically used for modems, mice, and some kinds of printers. This is |
| 654 | described in L<perlfaq8/"How do I read and write the serial port?"> |
| 655 | It's often enough to open them carefully: |
| 656 | |
| 657 | sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY) |
| 658 | # (O_NOCTTY no longer needed on POSIX systems) |
| 659 | or die "can't open /dev/ttyS1: $!"; |
| 660 | open(TTYOUT, "+>&TTYIN") |
| 661 | or die "can't dup TTYIN: $!"; |
| 662 | |
| 663 | $ofh = select(TTYOUT); $| = 1; select($ofh); |
| 664 | |
| 665 | print TTYOUT "+++at\015"; |
| 666 | $answer = <TTYIN>; |
| 667 | |
| 668 | With descriptors that you haven't opened using C<sysopen>, such as a |
| 669 | socket, you can set them to be non-blocking using C<fcntl>: |
| 670 | |
| 671 | use Fcntl; |
| 672 | fcntl(Connection, F_SETFL, O_NONBLOCK) |
| 673 | or die "can't set non blocking: $!"; |
| 674 | |
| 675 | Rather than losing yourself in a morass of twisting, turning C<ioctl>s, |
| 676 | all dissimilar, if you're going to manipulate ttys, it's best to |
| 677 | make calls out to the stty(1) program if you have it, or else use the |
| 678 | portable POSIX interface. To figure this all out, you'll need to read the |
| 679 | termios(3) manpage, which describes the POSIX interface to tty devices, |
| 680 | and then L<POSIX>, which describes Perl's interface to POSIX. There are |
| 681 | also some high-level modules on CPAN that can help you with these games. |
| 682 | Check out Term::ReadKey and Term::ReadLine. |
| 683 | |
| 684 | What else can you open? To open a connection using sockets, you won't use |
| 685 | one of Perl's two open functions. See L<perlipc/"Sockets: Client/Server |
| 686 | Communication"> for that. Here's an example. Once you have it, |
| 687 | you can use FH as a bidirectional filehandle. |
| 688 | |
| 689 | use IO::Socket; |
| 690 | local *FH = IO::Socket::INET->new("www.perl.com:80"); |
| 691 | |
| 692 | For opening up a URL, the LWP modules from CPAN are just what |
| 693 | the doctor ordered. There's no filehandle interface, but |
| 694 | it's still easy to get the contents of a document: |
| 695 | |
| 696 | use LWP::Simple; |
| 697 | $doc = get('http://www.sn.no/libwww-perl/'); |
| 698 | |
| 699 | =head2 Binary Files |
| 700 | |
| 701 | On certain legacy systems with what could charitably be called terminally |
| 702 | convoluted (some would say broken) I/O models, a file isn't a file--at |
| 703 | least, not with respect to the C standard I/O library. On these old |
| 704 | systems whose libraries (but not kernels) distinguish between text and |
| 705 | binary streams, to get files to behave properly you'll have to bend over |
| 706 | backwards to avoid nasty problems. On such infelicitous systems, sockets |
| 707 | and pipes are already opened in binary mode, and there is currently no |
| 708 | way to turn that off. With files, you have more options. |
| 709 | |
| 710 | Another option is to use the C<binmode> function on the appropriate |
| 711 | handles before doing regular I/O on them: |
| 712 | |
| 713 | binmode(STDIN); |
| 714 | binmode(STDOUT); |
| 715 | while (<STDIN>) { print } |
| 716 | |
| 717 | Passing C<sysopen> a non-standard flag option will also open the file in |
| 718 | binary mode on those systems that support it. This is the equivalent of |
| 719 | opening the file normally, then calling C<binmode>ing on the handle. |
| 720 | |
| 721 | sysopen(BINDAT, "records.data", O_RDWR | O_BINARY) |
| 722 | || die "can't open records.data: $!"; |
| 723 | |
| 724 | Now you can use C<read> and C<print> on that handle without worrying |
| 725 | about the system non-standard I/O library breaking your data. It's not |
| 726 | a pretty picture, but then, legacy systems seldom are. CP/M will be |
| 727 | with us until the end of days, and after. |
| 728 | |
| 729 | On systems with exotic I/O systems, it turns out that, astonishingly |
| 730 | enough, even unbuffered I/O using C<sysread> and C<syswrite> might do |
| 731 | sneaky data mutilation behind your back. |
| 732 | |
| 733 | while (sysread(WHENCE, $buf, 1024)) { |
| 734 | syswrite(WHITHER, $buf, length($buf)); |
| 735 | } |
| 736 | |
| 737 | Depending on the vicissitudes of your runtime system, even these calls |
| 738 | may need C<binmode> or C<O_BINARY> first. Systems known to be free of |
| 739 | such difficulties include Unix, the Mac OS, Plan9, and Inferno. |
| 740 | |
| 741 | =head2 File Locking |
| 742 | |
| 743 | In a multitasking environment, you may need to be careful not to collide |
| 744 | with other processes who want to do I/O on the same files as others |
| 745 | are working on. You'll often need shared or exclusive locks |
| 746 | on files for reading and writing respectively. You might just |
| 747 | pretend that only exclusive locks exist. |
| 748 | |
| 749 | Never use the existence of a file C<-e $file> as a locking indication, |
| 750 | because there is a race condition between the test for the existence of |
| 751 | the file and its creation. Atomicity is critical. |
| 752 | |
| 753 | Perl's most portable locking interface is via the C<flock> function, |
| 754 | whose simplicity is emulated on systems that don't directly support it, |
| 755 | such as SysV or WindowsNT. The underlying semantics may affect how |
| 756 | it all works, so you should learn how C<flock> is implemented on your |
| 757 | system's port of Perl. |
| 758 | |
| 759 | File locking I<does not> lock out another process that would like to |
| 760 | do I/O. A file lock only locks out others trying to get a lock, not |
| 761 | processes trying to do I/O. Because locks are advisory, if one process |
| 762 | uses locking and another doesn't, all bets are off. |
| 763 | |
| 764 | By default, the C<flock> call will block until a lock is granted. |
| 765 | A request for a shared lock will be granted as soon as there is no |
| 766 | exclusive locker. A request for a exclusive lock will be granted as |
| 767 | soon as there is no locker of any kind. Locks are on file descriptors, |
| 768 | not file names. You can't lock a file until you open it, and you can't |
| 769 | hold on to a lock once the file has been closed. |
| 770 | |
| 771 | Here's how to get a blocking shared lock on a file, typically used |
| 772 | for reading: |
| 773 | |
| 774 | use 5.004; |
| 775 | use Fcntl qw(:DEFAULT :flock); |
| 776 | open(FH, "< filename") or die "can't open filename: $!"; |
| 777 | flock(FH, LOCK_SH) or die "can't lock filename: $!"; |
| 778 | # now read from FH |
| 779 | |
| 780 | You can get a non-blocking lock by using C<LOCK_NB>. |
| 781 | |
| 782 | flock(FH, LOCK_SH | LOCK_NB) |
| 783 | or die "can't lock filename: $!"; |
| 784 | |
| 785 | This can be useful for producing more user-friendly behaviour by warning |
| 786 | if you're going to be blocking: |
| 787 | |
| 788 | use 5.004; |
| 789 | use Fcntl qw(:DEFAULT :flock); |
| 790 | open(FH, "< filename") or die "can't open filename: $!"; |
| 791 | unless (flock(FH, LOCK_SH | LOCK_NB)) { |
| 792 | $| = 1; |
| 793 | print "Waiting for lock..."; |
| 794 | flock(FH, LOCK_SH) or die "can't lock filename: $!"; |
| 795 | print "got it.\n" |
| 796 | } |
| 797 | # now read from FH |
| 798 | |
| 799 | To get an exclusive lock, typically used for writing, you have to be |
| 800 | careful. We C<sysopen> the file so it can be locked before it gets |
| 801 | emptied. You can get a nonblocking version using C<LOCK_EX | LOCK_NB>. |
| 802 | |
| 803 | use 5.004; |
| 804 | use Fcntl qw(:DEFAULT :flock); |
| 805 | sysopen(FH, "filename", O_WRONLY | O_CREAT) |
| 806 | or die "can't open filename: $!"; |
| 807 | flock(FH, LOCK_EX) |
| 808 | or die "can't lock filename: $!"; |
| 809 | truncate(FH, 0) |
| 810 | or die "can't truncate filename: $!"; |
| 811 | # now write to FH |
| 812 | |
| 813 | Finally, due to the uncounted millions who cannot be dissuaded from |
| 814 | wasting cycles on useless vanity devices called hit counters, here's |
| 815 | how to increment a number in a file safely: |
| 816 | |
| 817 | use Fcntl qw(:DEFAULT :flock); |
| 818 | |
| 819 | sysopen(FH, "numfile", O_RDWR | O_CREAT) |
| 820 | or die "can't open numfile: $!"; |
| 821 | # autoflush FH |
| 822 | $ofh = select(FH); $| = 1; select ($ofh); |
| 823 | flock(FH, LOCK_EX) |
| 824 | or die "can't write-lock numfile: $!"; |
| 825 | |
| 826 | $num = <FH> || 0; |
| 827 | seek(FH, 0, 0) |
| 828 | or die "can't rewind numfile : $!"; |
| 829 | print FH $num+1, "\n" |
| 830 | or die "can't write numfile: $!"; |
| 831 | |
| 832 | truncate(FH, tell(FH)) |
| 833 | or die "can't truncate numfile: $!"; |
| 834 | close(FH) |
| 835 | or die "can't close numfile: $!"; |
| 836 | |
| 837 | =head1 SEE ALSO |
| 838 | |
| 839 | The C<open> and C<sysopen> function in perlfunc(1); |
| 840 | the standard open(2), dup(2), fopen(3), and fdopen(3) manpages; |
| 841 | the POSIX documentation. |
| 842 | |
| 843 | =head1 AUTHOR and COPYRIGHT |
| 844 | |
| 845 | Copyright 1998 Tom Christiansen. |
| 846 | |
| 847 | When included as part of the Standard Version of Perl, or as part of |
| 848 | its complete documentation whether printed or otherwise, this work may |
| 849 | be distributed only under the terms of Perl's Artistic License. Any |
| 850 | distribution of this file or derivatives thereof outside of that |
| 851 | package require that special arrangements be made with copyright |
| 852 | holder. |
| 853 | |
| 854 | Irrespective of its distribution, all code examples in these files are |
| 855 | hereby placed into the public domain. You are permitted and |
| 856 | encouraged to use this code in your own programs for fun or for profit |
| 857 | as you see fit. A simple comment in the code giving credit would be |
| 858 | courteous but is not required. |
| 859 | |
| 860 | =head1 HISTORY |
| 861 | |
| 862 | First release: Sat Jan 9 08:09:11 MST 1999 |