Commit | Line | Data |
---|---|---|
b25a8b16 TC |
1 | =encoding utf8 |
2 | ||
f8284313 TC |
3 | =head1 NAME |
4 | ||
b25a8b16 | 5 | perlopentut - simple recipes for opening files and pipes in Perl |
f8284313 TC |
6 | |
7 | =head1 DESCRIPTION | |
8 | ||
b25a8b16 TC |
9 | Whenever you do I/O on a file in Perl, you do so through what in Perl is |
10 | called a B<filehandle>. A filehandle is an internal name for an external | |
11 | file. It is the job of the C<open> function to make the association | |
12 | between the internal name and the external name, and it is the job | |
13 | of the C<close> function to break that associations. | |
f8284313 | 14 | |
b25a8b16 TC |
15 | For your convenience, Perl sets up a few special filehandles that are |
16 | already open when you run. These include C<STDIN>, C<STDOUT>, C<STDERR>, | |
17 | and C<ARGV>. Since those are pre-opened, you can use them right away | |
18 | without having to go to the trouble of opening them yourself: | |
f8284313 | 19 | |
b25a8b16 | 20 | print STDERR "This is a debugging message.\n"; |
f8284313 | 21 | |
b25a8b16 TC |
22 | print STDOUT "Please enter something: "; |
23 | $response = <STDIN> // die "how come no input?"; | |
24 | print STDOUT "Thank you!\n"; | |
f8284313 | 25 | |
b25a8b16 | 26 | while (<ARGV>) { ... } |
f8284313 | 27 | |
b25a8b16 TC |
28 | As you see from those examples, C<STDOUT> and C<STDERR> are output |
29 | handles, and C<STDIN> and C<ARGV> are input handles. Those are | |
30 | in all capital letters because they are reserved to Perl, much | |
31 | like the C<@ARGV> array and the C<%ENV> hash are. Their external | |
32 | associations were set up by your shell. | |
f8284313 | 33 | |
b25a8b16 TC |
34 | For eveyrthing else, you will need to open it on your own. Although there |
35 | are many other variants, the most common way to call Perl's open() function | |
36 | is with three arguments and one return value: | |
f8284313 | 37 | |
b25a8b16 | 38 | C< I<OK> = open(I<HANDLE>, I<MODE>, I<PATHNAME>)> |
f8284313 | 39 | |
b25a8b16 | 40 | Where: |
f8284313 | 41 | |
b25a8b16 | 42 | =over |
f8284313 | 43 | |
b25a8b16 | 44 | =item I<OK> |
f8284313 | 45 | |
b25a8b16 TC |
46 | will be some defined value if the open succeeds, but |
47 | C<undef> if it fails; | |
f8284313 | 48 | |
b25a8b16 | 49 | =item I<HANDLE> |
1a193132 | 50 | |
b25a8b16 TC |
51 | should be an undefined scalar variable to be filled in by the |
52 | C<open> function if it succeeds; | |
1a193132 | 53 | |
b25a8b16 | 54 | =item I<MODE> |
1a193132 | 55 | |
b25a8b16 | 56 | is the access mode and the encoding format to open the file with; |
f8284313 | 57 | |
b25a8b16 | 58 | =item I<PATHNAME> |
f8284313 | 59 | |
b25a8b16 | 60 | is the external name of the file you want opened. |
f8284313 | 61 | |
b25a8b16 | 62 | =back |
f8284313 | 63 | |
b25a8b16 TC |
64 | Most of the complexity of the C<open> function lies in the many |
65 | possible values that the I<MODE> parameter can take on. | |
1a193132 | 66 | |
b25a8b16 TC |
67 | One last thing before we show you how to open files: opening |
68 | files does not (usually) automatically lock them in Perl. See | |
69 | L<perlfaq4> for how to lock. | |
1a193132 | 70 | |
b25a8b16 | 71 | =head1 Opening Text Files |
1a193132 | 72 | |
b25a8b16 | 73 | =head2 Opening Text Files for Reading |
1a193132 | 74 | |
b25a8b16 TC |
75 | If you want to read from a text file, first open it in |
76 | read-only mode like this: | |
1a193132 | 77 | |
b25a8b16 TC |
78 | my $filename = "/some/path/to/a/textfile/goes/here"; |
79 | my $encoding = ":encoding(UTF-8)"; | |
80 | my $handle = undef; # this will be filled in on success | |
1a193132 | 81 | |
b25a8b16 TC |
82 | open($handle, "< $encoding", $filename) |
83 | || die "$0: can't open $filename for reading: $!\n"; | |
1a193132 | 84 | |
b25a8b16 TC |
85 | As with the shell, in Perl the C<< "<" >> is used to open the file in |
86 | read-only mode. If it succeeds, Perl allocates a brand new filehandle for | |
87 | you and fills in your previously undefined C<$handle> argument with a | |
88 | reference to that handle. | |
1a193132 | 89 | |
b25a8b16 TC |
90 | Now you may use functions like C<readline>, C<read>, C<getc>, and |
91 | C<sysread> on that handle. Probably the most common input function | |
92 | is the one that looks like an operator: | |
1a193132 | 93 | |
b25a8b16 TC |
94 | $line = readline($handle); |
95 | $line = <$handle>; # same thing | |
d7d7fefd | 96 | |
b25a8b16 TC |
97 | Because the C<readline> function returns C<undef> at end of file or |
98 | upon error, you will sometimes see it used this way: | |
d7d7fefd | 99 | |
b25a8b16 TC |
100 | $line = <$handle>; |
101 | if (defined $line) { | |
102 | # do something with $line | |
d7d7fefd | 103 | } |
b25a8b16 TC |
104 | else { |
105 | # $line is not valid, so skip it | |
494bd333 | 106 | } |
f8284313 | 107 | |
b25a8b16 | 108 | You can also just quickly C<die> on an undefined value this way: |
f8284313 | 109 | |
b25a8b16 | 110 | $line = <$handle> // die "no input found"; |
f8284313 | 111 | |
b25a8b16 TC |
112 | However, if hitting EOF is an expected and normal event, you |
113 | would not to exit just because you ran out of input. Instead, | |
114 | you probably just want to exit an input loop. Immediately | |
115 | afterwards you can then test to see if there was an actual | |
116 | error that caused the loop to terminate, and act accordingly: | |
f8284313 | 117 | |
b25a8b16 TC |
118 | while (<$handle>) { |
119 | # do something with data in $_ | |
120 | } | |
121 | if ($!) { | |
122 | die "unexpected error while reading from $filename: $!"; | |
123 | } | |
f8284313 | 124 | |
b25a8b16 TC |
125 | B<A Note on Encodings>: Having to specify the text encoding every time |
126 | might seem a bit of a bother. To set up a default encoding for C<open> so | |
127 | that you don't have to supply it each time, you can use the C<open> pragma: | |
f8284313 | 128 | |
b25a8b16 | 129 | use open qw< :encoding(UTF-8) >; |
f8284313 | 130 | |
b25a8b16 TC |
131 | Once you've done that, you can safely omit the encoding part of the |
132 | open mode: | |
f8284313 | 133 | |
b25a8b16 TC |
134 | open($handle, "<", $filename) |
135 | || die "$0: can't open $filename for reading: $!\n"; | |
f8284313 | 136 | |
b25a8b16 TC |
137 | But never use the bare C<< "<" >> without having set up a default encoding |
138 | first. Otherwise, Perl cannot know which of the many, many, many possible | |
139 | flavors of text file you have, and Perl will have no idea how to correctly | |
140 | map the data in your file into actual characters it can work with. Other | |
141 | common encoding formats including C<"ASCII">, C<"ISO-8859-1">, | |
142 | C<"ISO-8859-15">, C<"Windows-1252">, C<"MacRoman">, and even C<"UTF-16LE">. | |
143 | See L<perlunitut> for more about encodings. | |
f8284313 | 144 | |
b25a8b16 | 145 | =head2 Opening Text Files for Writing |
f8284313 | 146 | |
b25a8b16 TC |
147 | On the other hand, you want to write to a file, you first have to decide |
148 | what to do about any existing contents. You have two basic choices here: | |
149 | to preserve or to clobber. | |
f8284313 | 150 | |
b25a8b16 TC |
151 | If you want to preserve any existing contents, then you want to open the |
152 | file in append mode. As in the shell, in Perl you use C<<< ">>" >>> to | |
153 | open an existing file in append mode, and creates the file if it does not | |
154 | already exist. | |
f8284313 | 155 | |
b25a8b16 TC |
156 | my $handle = undef; |
157 | my $filename = "/some/path/to/a/textfile/goes/here"; | |
158 | my $encoding = ":encoding(UTF-8)"; | |
f8284313 | 159 | |
b25a8b16 TC |
160 | open($handle, ">> $encoding", $filename) |
161 | || die "$0: can't open $filename for appending: $!\n"; | |
f8284313 | 162 | |
b25a8b16 TC |
163 | Now you can write to that filehandle using any of C<print>, C<printf>, |
164 | C<say>, C<write>, or C<syswrite>. | |
f8284313 | 165 | |
b25a8b16 TC |
166 | The file does not have to exist just to open it in append mode. If the |
167 | file did not previously exist, then the append-mode open creates it for | |
168 | you. But if the file does previously exist, its contents are safe from | |
169 | harm because you will be adding your new text past the end of the old text. | |
f8284313 | 170 | |
b25a8b16 TC |
171 | On the other hand, sometimes you want to clobber whatever might already be |
172 | there. To empty out a file before you start writing to it, you can open it | |
173 | in write-only mode: | |
f8284313 | 174 | |
b25a8b16 TC |
175 | my $handle = undef; |
176 | my $filename = "/some/path/to/a/textfile/goes/here"; | |
177 | my $encoding = ":encoding(UTF-8)"; | |
f8284313 | 178 | |
b25a8b16 TC |
179 | open($handle, "> $encoding", $filename) |
180 | || die "$0: can't open $filename in write-open mode: $!\n"; | |
f8284313 | 181 | |
b25a8b16 TC |
182 | Here again Perl works just like the shell in that the C<< ">" >> clobbers |
183 | an existing file. | |
f8284313 | 184 | |
b25a8b16 TC |
185 | As with the append mode, when you open a file in write-only mode, |
186 | you can now write to that filehandle using any of C<print>, C<printf>, | |
187 | C<say>, C<write>, or C<syswrite>. | |
f8284313 | 188 | |
b25a8b16 TC |
189 | What about read-write mode? You should probably pretend it doesn't exist, |
190 | because opening text files in read-write mode is unlikely to do what you | |
191 | would like. See L<perlfaq4> for details. | |
f8284313 | 192 | |
b25a8b16 | 193 | =head1 Opening Binary Files |
f8284313 | 194 | |
b25a8b16 TC |
195 | If the file to be opened contains binary data instead of text characters, |
196 | then the C<MODE> argument to C<open> is a little different. Instead of | |
197 | specifying the encoding, you tell Perl that your data are in raw bytes. | |
f8284313 | 198 | |
b25a8b16 TC |
199 | my $filename = "/some/path/to/a/binary/file/goes/here"; |
200 | my $encoding = ":raw :bytes" | |
201 | my $handle = undef; # this will be filled in on success | |
f8284313 | 202 | |
b25a8b16 TC |
203 | And then open as before, choosing C<<< "<" >>>, C<<< ">>" >>>, or |
204 | C<<< ">" >>> as needed: | |
f8284313 | 205 | |
b25a8b16 TC |
206 | open($handle, "< $encoding", $filename) |
207 | || die "$0: can't open $filename for reading: $!\n"; | |
f8284313 | 208 | |
b25a8b16 TC |
209 | open($handle, ">> $encoding", $filename) |
210 | || die "$0: can't open $filename for appending: $!\n"; | |
f8284313 | 211 | |
b25a8b16 TC |
212 | open($handle, "> $encoding", $filename) |
213 | || die "$0: can't open $filename in write-open mode: $!\n"; | |
f8284313 | 214 | |
b25a8b16 | 215 | Alternately, you can change to binary mode on an existing handle this way: |
f8284313 | 216 | |
b25a8b16 | 217 | binmode($handle) || die "cannot binmode handle"; |
f8284313 | 218 | |
b25a8b16 | 219 | This is especially handy for the handles that Perl has already opened for you. |
f8284313 | 220 | |
b25a8b16 TC |
221 | binmode(STDIN) || die "cannot binmode STDIN"; |
222 | binmode(STDOUT) || die "cannot binmode STDOUT"; | |
f8284313 | 223 | |
b25a8b16 TC |
224 | You can also pass C<binmode> an explicit encoding to change it on the fly. |
225 | This isn't exactly "binary" mode, but we still use C<binmode> to do it: | |
f8284313 | 226 | |
b25a8b16 TC |
227 | binmode(STDIN, ":encoding(MacRoman)") || die "cannot binmode STDIN"; |
228 | binmode(STDOUT, ":encoding(UTF-8)") || die "cannot binmode STDOUT"; | |
f8284313 | 229 | |
b25a8b16 TC |
230 | Once you have your binary file properly opened in the right mode, you can |
231 | use all the same Perl I/O functions as you used on text files. However, | |
232 | you may wish to use the fixed-size C<read> instead of the variable-sized | |
233 | C<readline> for your input. | |
f8284313 | 234 | |
b25a8b16 | 235 | Here's an example of how to copy a binary file: |
f8284313 | 236 | |
b25a8b16 TC |
237 | my $BUFSIZ = 64 * (2 ** 10); |
238 | my $name_in = "/some/input/file"; | |
239 | my $name_out = "/some/output/flie"; | |
f8284313 | 240 | |
b25a8b16 | 241 | my($in_fh, $out_fh, $buffer); |
f8284313 | 242 | |
b25a8b16 TC |
243 | open($in_fh, "<", $name_in) || die "$0: cannot open $name_in for reading: $!"; |
244 | open($out_fh, ">", $name_out) || die "$0: cannot open $name_out for writing: $!"; | |
f8284313 | 245 | |
b25a8b16 TC |
246 | for my $fh ($in_fh, $out_fh) { |
247 | binmode($fh) || die "binmode failed"; | |
248 | } | |
f8284313 | 249 | |
b25a8b16 TC |
250 | while (read($in_fh, $buffer, $BUFSIZ)) { |
251 | unless (print $out_fh $buffer) { | |
252 | die "couldn't write to $name_out: $!"; | |
253 | } | |
254 | } | |
f8284313 | 255 | |
b25a8b16 TC |
256 | close($in_fh) || die "couldn't close $name_in: $!"; |
257 | close($out_fh) || die "couldn't close $name_out: $!"; | |
f8284313 | 258 | |
b25a8b16 | 259 | =head1 Opening Pipes |
f8284313 | 260 | |
b25a8b16 | 261 | To be announced. |
ae258fbb | 262 | |
b25a8b16 | 263 | =head1 Low-level File Opens via sysopen |
ae258fbb | 264 | |
b25a8b16 | 265 | To be announced. Or deleted. |
ae258fbb | 266 | |
b25a8b16 | 267 | =head1 SEE ALSO |
f8284313 | 268 | |
b25a8b16 | 269 | To be announced. |
f8284313 TC |
270 | |
271 | =head1 AUTHOR and COPYRIGHT | |
272 | ||
b25a8b16 | 273 | To be announced. |
f8284313 | 274 | |
b25a8b16 | 275 | =head1 HISTORY |
f8284313 | 276 | |
b25a8b16 | 277 | To be announced. |
f8284313 | 278 | |
f8284313 | 279 |