Commit | Line | Data |
---|---|---|
b25a8b16 TC |
1 | =encoding utf8 |
2 | ||
f8284313 TC |
3 | =head1 NAME |
4 | ||
b25a8b16 | 5 | perlopentut - simple recipes for opening files and pipes in Perl |
f8284313 TC |
6 | |
7 | =head1 DESCRIPTION | |
8 | ||
b25a8b16 TC |
9 | Whenever you do I/O on a file in Perl, you do so through what in Perl is |
10 | called a B<filehandle>. A filehandle is an internal name for an external | |
11 | file. It is the job of the C<open> function to make the association | |
12 | between the internal name and the external name, and it is the job | |
375c68c1 | 13 | of the C<close> function to break that association. |
f8284313 | 14 | |
b25a8b16 TC |
15 | For your convenience, Perl sets up a few special filehandles that are |
16 | already open when you run. These include C<STDIN>, C<STDOUT>, C<STDERR>, | |
17 | and C<ARGV>. Since those are pre-opened, you can use them right away | |
18 | without having to go to the trouble of opening them yourself: | |
f8284313 | 19 | |
b25a8b16 | 20 | print STDERR "This is a debugging message.\n"; |
f8284313 | 21 | |
b25a8b16 TC |
22 | print STDOUT "Please enter something: "; |
23 | $response = <STDIN> // die "how come no input?"; | |
24 | print STDOUT "Thank you!\n"; | |
f8284313 | 25 | |
b25a8b16 | 26 | while (<ARGV>) { ... } |
f8284313 | 27 | |
b25a8b16 | 28 | As you see from those examples, C<STDOUT> and C<STDERR> are output |
375c68c1 | 29 | handles, and C<STDIN> and C<ARGV> are input handles. They are |
b25a8b16 TC |
30 | in all capital letters because they are reserved to Perl, much |
31 | like the C<@ARGV> array and the C<%ENV> hash are. Their external | |
32 | associations were set up by your shell. | |
f8284313 | 33 | |
375c68c1 JK |
34 | You will need to open every other filehandle on your own. Although there |
35 | are many variants, the most common way to call Perl's open() function | |
b25a8b16 | 36 | is with three arguments and one return value: |
f8284313 | 37 | |
b25a8b16 | 38 | C< I<OK> = open(I<HANDLE>, I<MODE>, I<PATHNAME>)> |
f8284313 | 39 | |
b25a8b16 | 40 | Where: |
f8284313 | 41 | |
b25a8b16 | 42 | =over |
f8284313 | 43 | |
b25a8b16 | 44 | =item I<OK> |
f8284313 | 45 | |
b25a8b16 TC |
46 | will be some defined value if the open succeeds, but |
47 | C<undef> if it fails; | |
f8284313 | 48 | |
b25a8b16 | 49 | =item I<HANDLE> |
1a193132 | 50 | |
b25a8b16 TC |
51 | should be an undefined scalar variable to be filled in by the |
52 | C<open> function if it succeeds; | |
1a193132 | 53 | |
b25a8b16 | 54 | =item I<MODE> |
1a193132 | 55 | |
b25a8b16 | 56 | is the access mode and the encoding format to open the file with; |
f8284313 | 57 | |
b25a8b16 | 58 | =item I<PATHNAME> |
f8284313 | 59 | |
b25a8b16 | 60 | is the external name of the file you want opened. |
f8284313 | 61 | |
b25a8b16 | 62 | =back |
f8284313 | 63 | |
b25a8b16 TC |
64 | Most of the complexity of the C<open> function lies in the many |
65 | possible values that the I<MODE> parameter can take on. | |
1a193132 | 66 | |
b25a8b16 TC |
67 | One last thing before we show you how to open files: opening |
68 | files does not (usually) automatically lock them in Perl. See | |
1b59a132 | 69 | L<perlfaq5> for how to lock. |
1a193132 | 70 | |
b25a8b16 | 71 | =head1 Opening Text Files |
1a193132 | 72 | |
b25a8b16 | 73 | =head2 Opening Text Files for Reading |
1a193132 | 74 | |
b25a8b16 TC |
75 | If you want to read from a text file, first open it in |
76 | read-only mode like this: | |
1a193132 | 77 | |
b25a8b16 TC |
78 | my $filename = "/some/path/to/a/textfile/goes/here"; |
79 | my $encoding = ":encoding(UTF-8)"; | |
80 | my $handle = undef; # this will be filled in on success | |
1a193132 | 81 | |
b25a8b16 | 82 | open($handle, "< $encoding", $filename) |
d49b925c | 83 | || die "$0: can't open $filename for reading: $!"; |
1a193132 | 84 | |
b25a8b16 TC |
85 | As with the shell, in Perl the C<< "<" >> is used to open the file in |
86 | read-only mode. If it succeeds, Perl allocates a brand new filehandle for | |
87 | you and fills in your previously undefined C<$handle> argument with a | |
88 | reference to that handle. | |
1a193132 | 89 | |
b25a8b16 TC |
90 | Now you may use functions like C<readline>, C<read>, C<getc>, and |
91 | C<sysread> on that handle. Probably the most common input function | |
92 | is the one that looks like an operator: | |
1a193132 | 93 | |
b25a8b16 TC |
94 | $line = readline($handle); |
95 | $line = <$handle>; # same thing | |
d7d7fefd | 96 | |
b25a8b16 TC |
97 | Because the C<readline> function returns C<undef> at end of file or |
98 | upon error, you will sometimes see it used this way: | |
d7d7fefd | 99 | |
b25a8b16 TC |
100 | $line = <$handle>; |
101 | if (defined $line) { | |
102 | # do something with $line | |
d7d7fefd | 103 | } |
b25a8b16 TC |
104 | else { |
105 | # $line is not valid, so skip it | |
494bd333 | 106 | } |
f8284313 | 107 | |
b25a8b16 | 108 | You can also just quickly C<die> on an undefined value this way: |
f8284313 | 109 | |
b25a8b16 | 110 | $line = <$handle> // die "no input found"; |
f8284313 | 111 | |
375c68c1 JK |
112 | However, if hitting EOF is an expected and normal event, you do not want to |
113 | exit simply because you have run out of input. Instead, you probably just want | |
114 | to exit an input loop. You can then test to see if an actual error has caused | |
115 | the loop to terminate, and act accordingly: | |
f8284313 | 116 | |
b25a8b16 TC |
117 | while (<$handle>) { |
118 | # do something with data in $_ | |
119 | } | |
120 | if ($!) { | |
121 | die "unexpected error while reading from $filename: $!"; | |
122 | } | |
f8284313 | 123 | |
b25a8b16 TC |
124 | B<A Note on Encodings>: Having to specify the text encoding every time |
125 | might seem a bit of a bother. To set up a default encoding for C<open> so | |
126 | that you don't have to supply it each time, you can use the C<open> pragma: | |
f8284313 | 127 | |
b25a8b16 | 128 | use open qw< :encoding(UTF-8) >; |
f8284313 | 129 | |
b25a8b16 TC |
130 | Once you've done that, you can safely omit the encoding part of the |
131 | open mode: | |
f8284313 | 132 | |
b25a8b16 | 133 | open($handle, "<", $filename) |
d49b925c | 134 | || die "$0: can't open $filename for reading: $!"; |
f8284313 | 135 | |
b25a8b16 TC |
136 | But never use the bare C<< "<" >> without having set up a default encoding |
137 | first. Otherwise, Perl cannot know which of the many, many, many possible | |
138 | flavors of text file you have, and Perl will have no idea how to correctly | |
139 | map the data in your file into actual characters it can work with. Other | |
140 | common encoding formats including C<"ASCII">, C<"ISO-8859-1">, | |
141 | C<"ISO-8859-15">, C<"Windows-1252">, C<"MacRoman">, and even C<"UTF-16LE">. | |
142 | See L<perlunitut> for more about encodings. | |
f8284313 | 143 | |
b25a8b16 | 144 | =head2 Opening Text Files for Writing |
f8284313 | 145 | |
375c68c1 JK |
146 | When you want to write to a file, you first have to decide what to do about |
147 | any existing contents of that file. You have two basic choices here: to | |
148 | preserve or to clobber. | |
f8284313 | 149 | |
375c68c1 JK |
150 | If you want to preserve any existing contents, then you want to open the file |
151 | in append mode. As in the shell, in Perl you use C<<< ">>" >>> to open an | |
152 | existing file in append mode. C<<< ">>" >>> creates the file if it does not | |
b25a8b16 | 153 | already exist. |
f8284313 | 154 | |
b25a8b16 TC |
155 | my $handle = undef; |
156 | my $filename = "/some/path/to/a/textfile/goes/here"; | |
157 | my $encoding = ":encoding(UTF-8)"; | |
f8284313 | 158 | |
b25a8b16 | 159 | open($handle, ">> $encoding", $filename) |
d49b925c | 160 | || die "$0: can't open $filename for appending: $!"; |
f8284313 | 161 | |
b25a8b16 TC |
162 | Now you can write to that filehandle using any of C<print>, C<printf>, |
163 | C<say>, C<write>, or C<syswrite>. | |
f8284313 | 164 | |
375c68c1 JK |
165 | As noted above, if the file does not already exist, then the append-mode open |
166 | will create it for you. But if the file does already exist, its contents are | |
167 | safe from harm because you will be adding your new text past the end of the | |
168 | old text. | |
f8284313 | 169 | |
b25a8b16 TC |
170 | On the other hand, sometimes you want to clobber whatever might already be |
171 | there. To empty out a file before you start writing to it, you can open it | |
172 | in write-only mode: | |
f8284313 | 173 | |
b25a8b16 TC |
174 | my $handle = undef; |
175 | my $filename = "/some/path/to/a/textfile/goes/here"; | |
176 | my $encoding = ":encoding(UTF-8)"; | |
f8284313 | 177 | |
b25a8b16 | 178 | open($handle, "> $encoding", $filename) |
d49b925c | 179 | || die "$0: can't open $filename in write-open mode: $!"; |
f8284313 | 180 | |
b25a8b16 TC |
181 | Here again Perl works just like the shell in that the C<< ">" >> clobbers |
182 | an existing file. | |
f8284313 | 183 | |
b25a8b16 TC |
184 | As with the append mode, when you open a file in write-only mode, |
185 | you can now write to that filehandle using any of C<print>, C<printf>, | |
186 | C<say>, C<write>, or C<syswrite>. | |
f8284313 | 187 | |
b25a8b16 TC |
188 | What about read-write mode? You should probably pretend it doesn't exist, |
189 | because opening text files in read-write mode is unlikely to do what you | |
1b59a132 | 190 | would like. See L<perlfaq5> for details. |
f8284313 | 191 | |
b25a8b16 | 192 | =head1 Opening Binary Files |
f8284313 | 193 | |
b25a8b16 TC |
194 | If the file to be opened contains binary data instead of text characters, |
195 | then the C<MODE> argument to C<open> is a little different. Instead of | |
196 | specifying the encoding, you tell Perl that your data are in raw bytes. | |
f8284313 | 197 | |
b25a8b16 TC |
198 | my $filename = "/some/path/to/a/binary/file/goes/here"; |
199 | my $encoding = ":raw :bytes" | |
200 | my $handle = undef; # this will be filled in on success | |
f8284313 | 201 | |
b25a8b16 TC |
202 | And then open as before, choosing C<<< "<" >>>, C<<< ">>" >>>, or |
203 | C<<< ">" >>> as needed: | |
f8284313 | 204 | |
b25a8b16 | 205 | open($handle, "< $encoding", $filename) |
d49b925c | 206 | || die "$0: can't open $filename for reading: $!"; |
f8284313 | 207 | |
b25a8b16 | 208 | open($handle, ">> $encoding", $filename) |
d49b925c | 209 | || die "$0: can't open $filename for appending: $!"; |
f8284313 | 210 | |
b25a8b16 | 211 | open($handle, "> $encoding", $filename) |
d49b925c | 212 | || die "$0: can't open $filename in write-open mode: $!"; |
f8284313 | 213 | |
b25a8b16 | 214 | Alternately, you can change to binary mode on an existing handle this way: |
f8284313 | 215 | |
b25a8b16 | 216 | binmode($handle) || die "cannot binmode handle"; |
f8284313 | 217 | |
b25a8b16 | 218 | This is especially handy for the handles that Perl has already opened for you. |
f8284313 | 219 | |
b25a8b16 TC |
220 | binmode(STDIN) || die "cannot binmode STDIN"; |
221 | binmode(STDOUT) || die "cannot binmode STDOUT"; | |
f8284313 | 222 | |
b25a8b16 TC |
223 | You can also pass C<binmode> an explicit encoding to change it on the fly. |
224 | This isn't exactly "binary" mode, but we still use C<binmode> to do it: | |
f8284313 | 225 | |
c29b2abd KW |
226 | binmode(STDIN, ":encoding(MacRoman)") || die "cannot binmode STDIN"; |
227 | binmode(STDOUT, ":encoding(UTF-8)") || die "cannot binmode STDOUT"; | |
f8284313 | 228 | |
b25a8b16 TC |
229 | Once you have your binary file properly opened in the right mode, you can |
230 | use all the same Perl I/O functions as you used on text files. However, | |
231 | you may wish to use the fixed-size C<read> instead of the variable-sized | |
232 | C<readline> for your input. | |
f8284313 | 233 | |
b25a8b16 | 234 | Here's an example of how to copy a binary file: |
f8284313 | 235 | |
b25a8b16 TC |
236 | my $BUFSIZ = 64 * (2 ** 10); |
237 | my $name_in = "/some/input/file"; | |
238 | my $name_out = "/some/output/flie"; | |
f8284313 | 239 | |
b25a8b16 | 240 | my($in_fh, $out_fh, $buffer); |
f8284313 | 241 | |
c29b2abd KW |
242 | open($in_fh, "<", $name_in) |
243 | || die "$0: cannot open $name_in for reading: $!"; | |
244 | open($out_fh, ">", $name_out) | |
245 | || die "$0: cannot open $name_out for writing: $!"; | |
f8284313 | 246 | |
b25a8b16 TC |
247 | for my $fh ($in_fh, $out_fh) { |
248 | binmode($fh) || die "binmode failed"; | |
249 | } | |
f8284313 | 250 | |
b25a8b16 TC |
251 | while (read($in_fh, $buffer, $BUFSIZ)) { |
252 | unless (print $out_fh $buffer) { | |
253 | die "couldn't write to $name_out: $!"; | |
254 | } | |
255 | } | |
f8284313 | 256 | |
b25a8b16 TC |
257 | close($in_fh) || die "couldn't close $name_in: $!"; |
258 | close($out_fh) || die "couldn't close $name_out: $!"; | |
f8284313 | 259 | |
b25a8b16 | 260 | =head1 Opening Pipes |
f8284313 | 261 | |
b25a8b16 | 262 | To be announced. |
ae258fbb | 263 | |
b25a8b16 | 264 | =head1 Low-level File Opens via sysopen |
ae258fbb | 265 | |
b25a8b16 | 266 | To be announced. Or deleted. |
ae258fbb | 267 | |
b25a8b16 | 268 | =head1 SEE ALSO |
f8284313 | 269 | |
b25a8b16 | 270 | To be announced. |
f8284313 TC |
271 | |
272 | =head1 AUTHOR and COPYRIGHT | |
273 | ||
a1fc4cc4 | 274 | Copyright 2013 Tom Christiansen. |
f8284313 | 275 | |
a1fc4cc4 RS |
276 | This documentation is free; you can redistribute it and/or modify it under |
277 | the same terms as Perl itself. | |
f8284313 | 278 |