Commit | Line | Data |
---|---|---|
1141d9f8 NIS |
1 | package PerlIO; |
2 | ||
ce45800f | 3 | our $VERSION = '1.12'; |
8de1277c | 4 | |
1141d9f8 | 5 | # Map layer name to package that defines it |
c1a61b17 | 6 | our %alias; |
1141d9f8 NIS |
7 | |
8 | sub import | |
9 | { | |
10 | my $class = shift; | |
11 | while (@_) | |
12 | { | |
13 | my $layer = shift; | |
14 | if (exists $alias{$layer}) | |
15 | { | |
16 | $layer = $alias{$layer} | |
17 | } | |
18 | else | |
19 | { | |
20 | $layer = "${class}::$layer"; | |
21 | } | |
c7996136 | 22 | eval { require $layer =~ s{::}{/}gr . '.pm' }; |
1141d9f8 NIS |
23 | warn $@ if $@; |
24 | } | |
25 | } | |
26 | ||
39f7a870 JH |
27 | sub F_UTF8 () { 0x8000 } |
28 | ||
1141d9f8 NIS |
29 | 1; |
30 | __END__ | |
b3d30bf7 NIS |
31 | |
32 | =head1 NAME | |
33 | ||
7d3b96bb | 34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space |
b3d30bf7 NIS |
35 | |
36 | =head1 SYNOPSIS | |
37 | ||
57fb4502 DB |
38 | # support platform-native and CRLF text files |
39 | open(my $fh, "<:crlf", "my.txt") or die "open failed: $!"; | |
1cbfc93d | 40 | |
57fb4502 DB |
41 | # append UTF-8 encoded text |
42 | open(my $fh, ">>:encoding(UTF-8)", "some.log") | |
43 | or die "open failed: $!"; | |
44 | ||
45 | # portably open a binary file for reading | |
46 | open(my $fh, "<", "his.jpg") or die "open failed: $!"; | |
47 | binmode($fh) or die "binmode failed: $!"; | |
7d3b96bb NIS |
48 | |
49 | Shell: | |
57fb4502 | 50 | PERLIO=:perlio perl .... |
b3d30bf7 NIS |
51 | |
52 | =head1 DESCRIPTION | |
53 | ||
ec28694c JH |
54 | When an undefined layer 'foo' is encountered in an C<open> or |
55 | C<binmode> layer specification then C code performs the equivalent of: | |
b3d30bf7 NIS |
56 | |
57 | use PerlIO 'foo'; | |
58 | ||
57fb4502 | 59 | The Perl code in PerlIO.pm then attempts to locate a layer by doing |
b3d30bf7 NIS |
60 | |
61 | require PerlIO::foo; | |
62 | ||
47bfe92f JH |
63 | Otherwise the C<PerlIO> package is a place holder for additional |
64 | PerlIO related functions. | |
b3d30bf7 | 65 | |
57fb4502 DB |
66 | =head2 Layers |
67 | ||
68 | Generally speaking, PerlIO layers (previously sometimes referred to as | |
69 | "disciplines") are an ordered stack applied to a filehandle (specified as | |
70 | a space- or colon-separated list, conventionally written with a leading | |
71 | colon). Each layer performs some operation on any input or output, except | |
72 | when bypassed such as with C<sysread> or C<syswrite>. Read operations go | |
73 | through the stack in the order they are set (left to right), and write | |
74 | operations in the reverse order. | |
75 | ||
76 | There are also layers which actually just set flags on lower layers, or | |
77 | layers that modify the current stack but don't persist on the stack | |
78 | themselves; these are referred to as pseudo-layers. | |
79 | ||
80 | When opening a handle, it will be opened with any layers specified | |
81 | explicitly in the open() call (or the platform defaults, if specified as | |
82 | a colon with no following layers). | |
83 | ||
84 | If layers are not explicitly specified, the handle will be opened with the | |
85 | layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually | |
86 | set by using the L<open> pragma for a lexical scope, or the C<-C> | |
87 | command-line switch or C<PERL_UNICODE> environment variable for the main | |
88 | program scope). | |
89 | ||
90 | If layers are not specified in the open() call or C<${^OPEN}> variable, | |
91 | the handle will be opened with the default layer stack configured for that | |
92 | architecture; see L</"Defaults and how to override them">. | |
93 | ||
94 | Some layers will automatically insert required lower level layers if not | |
95 | present; for example C<:perlio> will insert C<:unix> below itself for low | |
96 | level IO, and C<:encoding> will insert the platform defaults for buffered | |
97 | IO. | |
98 | ||
99 | The C<binmode> function can be called on an opened handle to push | |
100 | additional layers onto the stack, which may also modify the existing | |
101 | layers. C<binmode> called with no layers will remove or unset any | |
102 | existing layers which transform the byte stream, making the handle | |
103 | suitable for binary data. | |
104 | ||
7d3b96bb | 105 | The following layers are currently defined: |
b3d30bf7 | 106 | |
7d3b96bb NIS |
107 | =over 4 |
108 | ||
3d897973 | 109 | =item :unix |
7d3b96bb | 110 | |
3d897973 IT |
111 | Lowest level layer which provides basic PerlIO operations in terms of |
112 | UNIX/POSIX numeric file descriptor calls | |
113 | (open(), read(), write(), lseek(), close()). | |
57fb4502 DB |
114 | It is used even on non-Unix architectures, and most other layers operate on |
115 | top of it. | |
7d3b96bb | 116 | |
3d897973 | 117 | =item :stdio |
7d3b96bb | 118 | |
47bfe92f JH |
119 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note |
120 | that as this is "real" stdio it will ignore any layers beneath it and | |
9ec269cb | 121 | go straight to the operating system via the C library as usual. |
57fb4502 DB |
122 | This layer implements both low level IO and buffering, but is rarely used |
123 | on modern architectures. | |
7d3b96bb | 124 | |
3d897973 | 125 | =item :perlio |
7d3b96bb | 126 | |
3d897973 | 127 | A from scratch implementation of buffering for PerlIO. Provides fast |
57fb4502 | 128 | access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt> |
3d897973 | 129 | and in general attempts to minimize data copying. |
7d3b96bb | 130 | |
3d897973 | 131 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. |
7d3b96bb | 132 | |
3d897973 | 133 | =item :crlf |
7d3b96bb | 134 | |
3d897973 IT |
135 | A layer that implements DOS/Windows like CRLF line endings. On read |
136 | converts pairs of CR,LF to a single "\n" newline character. On write | |
8dcd593c LT |
137 | converts each "\n" to a CR,LF pair. Note that this layer will silently |
138 | refuse to be pushed on top of itself. | |
3d897973 IT |
139 | |
140 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z | |
141 | as being an end-of-file marker. | |
142 | ||
57fb4502 DB |
143 | On DOS/Windows like architectures where this layer is part of the defaults, |
144 | it also acts like the C<:perlio> layer, and removing the CRLF translation | |
145 | (such as with C<:raw>) will only unset the CRLF translation flag. Since | |
146 | Perl 5.14, you can also apply another C<:crlf> layer later, such as when | |
147 | the CRLF translation must occur after an encoding layer. On other | |
148 | architectures, it is a mundane CRLF translation layer and can be added and | |
149 | removed normally. | |
47bfe92f | 150 | |
57fb4502 DB |
151 | # translate CRLF after encoding on Perl 5.14 or newer |
152 | binmode $fh, ":raw:encoding(UTF-16LE):crlf" | |
153 | or die "binmode failed: $!"; | |
47bfe92f | 154 | |
57fb4502 | 155 | =item :utf8 |
7d3b96bb | 156 | |
57fb4502 DB |
157 | Pseudo-layer that declares that the stream accepts Perl's I<internal> |
158 | upgraded encoding of characters, which is approximately UTF-8 on ASCII | |
159 | machines, but UTF-EBCDIC on EBCDIC machines. This allows any character | |
160 | Perl can represent to be read from or written to the stream. | |
161 | ||
162 | This layer (which actually sets a flag on the preceding layer, and is | |
163 | implicitly set by any C<:encoding> layer) does not translate or validate | |
164 | byte sequences. It instead indicates that the byte stream will have been | |
165 | arranged by other layers to be provided in Perl's internal upgraded | |
166 | encoding, which Perl code (and correctly written XS code) will interpret | |
167 | as decoded Unicode characters. | |
168 | ||
169 | B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as | |
170 | invalid UTF-8 or binary data will result in malformed Perl strings. It is | |
171 | unlikely to produce invalid UTF-8 when used for output, though it will | |
172 | instead produce UTF-EBCDIC on EBCDIC systems. The C<:encoding(UTF-8)> | |
173 | layer (hyphen is significant) is preferred as it will ensure translation | |
174 | between valid UTF-8 bytes and valid Unicode characters. | |
740d4bb2 | 175 | |
3d897973 | 176 | =item :bytes |
c1a61b17 | 177 | |
57fb4502 | 178 | This is the inverse of the C<:utf8> pseudo-layer. It turns off the flag |
c1a61b17 | 179 | on the layer below so that data read from it is considered to |
57fb4502 DB |
180 | be Perl's internal downgraded encoding, thus interpreted as the native |
181 | single-byte encoding of Latin-1 or EBCDIC. Likewise on output Perl will | |
182 | warn if a "wide" character (a codepoint not in the range 0..255) is | |
183 | written to a such a stream. | |
184 | ||
185 | This is very dangerous to push on a handle using an C<:encoding> layer, | |
186 | as such a layer assumes to be working with Perl's internal upgraded | |
187 | encoding, so you will likely get a mangled result. Instead use C<:raw> or | |
188 | C<:pop> to remove encoding layers. | |
c1a61b17 | 189 | |
3d897973 | 190 | =item :raw |
7d3b96bb | 191 | |
57fb4502 | 192 | The C<:raw> pseudo-layer is I<defined> as being identical to calling |
9ec269cb | 193 | C<binmode($fh)> - the stream is made suitable for passing binary data, |
57fb4502 DB |
194 | i.e. each byte is passed as-is. The stream will still be buffered |
195 | (but this was not always true before Perl 5.14). | |
3d897973 | 196 | |
57fb4502 DB |
197 | In Perl 5.6 and some books the C<:raw> layer is documented as the inverse |
198 | of the C<:crlf> layer. That is no longer the case - other layers which | |
199 | would alter the binary nature of the stream are also disabled. If you | |
200 | want UNIX line endings on a platform that normally does CRLF translation, | |
201 | but still want UTF-8 or encoding defaults, the appropriate thing to do is | |
202 | to add C<:perlio> to the PERLIO environment variable, or open the handle | |
203 | explicitly with that layer, to replace the platform default of C<:crlf>. | |
1cbfc93d | 204 | |
0226bbdb | 205 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" |
57fb4502 DB |
206 | pops itself and then any layers which would modify the binary data stream. |
207 | (Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags | |
208 | rather than popping layers but that is an implementation detail.) | |
01e6739c | 209 | |
9ec269cb | 210 | As a consequence of the fact that C<:raw> normally pops layers, |
39f7a870 JH |
211 | it usually only makes sense to have it as the only or first element in |
212 | a layer specification. When used as the first element it provides | |
0226bbdb | 213 | a known base on which to build e.g. |
7d3b96bb | 214 | |
57fb4502 DB |
215 | open(my $fh,">:raw:encoding(UTF-8)",...) |
216 | or die "open failed: $!"; | |
7d3b96bb | 217 | |
57fb4502 DB |
218 | will construct a "binary" stream regardless of the platform defaults, |
219 | but then enable UTF-8 translation. | |
b3d30bf7 | 220 | |
3d897973 | 221 | =item :pop |
4ec2216f | 222 | |
57fb4502 | 223 | A pseudo-layer that removes the top-most layer. Gives Perl code a |
8a7bc862 | 224 | way to manipulate the layer stack. Note that C<:pop> only works on |
57fb4502 DB |
225 | real layers and will not undo the effects of pseudo-layers or flags |
226 | like C<:utf8>. An example of a possible use might be: | |
4ec2216f | 227 | |
57fb4502 | 228 | open(my $fh,...) or die "open failed: $!"; |
4ec2216f | 229 | ... |
57fb4502 DB |
230 | binmode($fh,":encoding(...)") or die "binmode failed: $!"; |
231 | # next chunk is encoded | |
4ec2216f | 232 | ... |
57fb4502 DB |
233 | binmode($fh,":pop") or die "binmode failed: $!"; |
234 | # back to un-encoded | |
4ec2216f NIS |
235 | |
236 | A more elegant (and safer) interface is needed. | |
237 | ||
7d3b96bb NIS |
238 | =back |
239 | ||
39f7a870 JH |
240 | =head2 Custom Layers |
241 | ||
242 | It is possible to write custom layers in addition to the above builtin | |
57fb4502 DB |
243 | ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>. |
244 | Some custom layers come with the Perl distribution. | |
39f7a870 JH |
245 | |
246 | =over 4 | |
247 | ||
248 | =item :encoding | |
249 | ||
57fb4502 DB |
250 | Use C<:encoding(ENCODING)> to transparently do character set and encoding |
251 | transformations, for example from Shift-JIS to Unicode. Note that an | |
252 | C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> for more | |
253 | information. | |
39f7a870 | 254 | |
307764ab LT |
255 | =item :mmap |
256 | ||
257 | A layer which implements "reading" of files by using C<mmap()> to | |
258 | make a (whole) file appear in the process's address space, and then | |
259 | using that as PerlIO's "buffer". This I<may> be faster in certain | |
260 | circumstances for large files, and may result in less physical memory | |
261 | use when multiple processes are reading the same file. | |
262 | ||
263 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> | |
264 | layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write | |
265 | needs extra house-keeping (to extend the file) which negates any advantage. | |
266 | ||
267 | The C<:mmap> layer will not exist if the platform does not support C<mmap()>. | |
57fb4502 | 268 | See L<PerlIO::mmap> for more information. |
307764ab | 269 | |
39f7a870 JH |
270 | =item :via |
271 | ||
57fb4502 DB |
272 | C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl |
273 | module, for example compression / decompression, encryption / decryption. | |
39f7a870 JH |
274 | See L<PerlIO::via> for more information. |
275 | ||
57fb4502 DB |
276 | =item :scalar |
277 | ||
278 | A layer implementing "in memory" files using scalar variables, | |
279 | automatically used in place of the platform defaults for IO when opening | |
280 | such a handle. As such, the scalar is expected to act like a file, only | |
281 | containing or storing bytes. See L<PerlIO::scalar> for more information. | |
282 | ||
39f7a870 JH |
283 | =back |
284 | ||
01e6739c NIS |
285 | =head2 Alternatives to raw |
286 | ||
0226bbdb | 287 | To get a binary stream an alternate method is to use: |
01e6739c | 288 | |
57fb4502 DB |
289 | open(my $fh,"<","whatever") or die "open failed: $!"; |
290 | binmode($fh) or die "binmode failed: $!"; | |
01e6739c | 291 | |
57fb4502 DB |
292 | This has the advantage of being backward compatible with older versions |
293 | of Perl that did not use PerlIO or where C<:raw> was buggy (as it was | |
294 | before Perl 5.14). | |
01e6739c | 295 | |
9ec269cb | 296 | To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) |
0226bbdb | 297 | in the open call: |
01e6739c | 298 | |
57fb4502 | 299 | open(my $fh,"<:unix",$path) or die "open failed: $!"; |
01e6739c | 300 | |
7d3b96bb NIS |
301 | =head2 Defaults and how to override them |
302 | ||
ec28694c | 303 | If the platform is MS-DOS like and normally does CRLF to "\n" |
57fb4502 | 304 | translation for text files then the default layers are: |
7d3b96bb | 305 | |
57fb4502 | 306 | :unix:crlf |
7d3b96bb | 307 | |
9ec269cb | 308 | Otherwise if C<Configure> found out how to do "fast" IO using the system's |
57fb4502 | 309 | stdio (not common on modern architectures), then the default layers are: |
7d3b96bb | 310 | |
57fb4502 | 311 | :stdio |
7d3b96bb NIS |
312 | |
313 | Otherwise the default layers are | |
314 | ||
57fb4502 | 315 | :unix:perlio |
7d3b96bb | 316 | |
57fb4502 DB |
317 | Note that the "default stack" depends on the operating system and on the |
318 | Perl version, and both the compile-time and runtime configurations of | |
319 | Perl. The default can be overridden by setting the environment variable | |
320 | PERLIO to a space or colon separated list of layers, however this cannot | |
321 | be used to set layers that require loading modules like C<:encoding>. | |
47bfe92f | 322 | |
7d3b96bb NIS |
323 | This can be used to see the effect of/bugs in the various layers e.g. |
324 | ||
325 | cd .../perl/t | |
57fb4502 DB |
326 | PERLIO=:stdio ./perl harness |
327 | PERLIO=:perlio ./perl harness | |
7d3b96bb | 328 | |
9ec269cb | 329 | For the various values of PERLIO see L<perlrun/PERLIO>. |
3b0db4f9 | 330 | |
57fb4502 DB |
331 | The following table summarizes the default layers on UNIX-like and |
332 | DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: | |
333 | ||
334 | PERLIO UNIX-like DOS-like | |
335 | ------ --------- -------- | |
336 | unset / "" :unix:perlio / :stdio [1] :unix:crlf | |
337 | :stdio :stdio :stdio | |
338 | :perlio :unix:perlio :unix:perlio | |
339 | ||
340 | # [1] ":stdio" if Configure found out how to do "fast stdio" (depends | |
341 | # on the stdio implementation) and in Perl 5.8, else ":unix:perlio" | |
342 | ||
4c11337c | 343 | =head2 Querying the layers of filehandles |
39f7a870 JH |
344 | |
345 | The following returns the B<names> of the PerlIO layers on a filehandle. | |
346 | ||
9d569fce | 347 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". |
39f7a870 JH |
348 | |
349 | The layers are returned in the order an open() or binmode() call would | |
57fb4502 | 350 | use them, and without colons. |
046e4a6a | 351 | |
9ec269cb SL |
352 | By default the layers from the input side of the filehandle are |
353 | returned; to get the output side, use the optional C<output> argument: | |
39f7a870 | 354 | |
2ae85e59 | 355 | my @layers = PerlIO::get_layers($fh, output => 1); |
39f7a870 JH |
356 | |
357 | (Usually the layers are identical on either side of a filehandle but | |
57fb4502 | 358 | for example with sockets there may be differences.) |
39f7a870 | 359 | |
92a3e63c JH |
360 | There is no set_layers(), nor does get_layers() return a tied array |
361 | mirroring the stack, or anything fancy like that. This is not | |
362 | accidental or unintentional. The PerlIO layer stack is a bit more | |
363 | complicated than just a stack (see for example the behaviour of C<:raw>). | |
364 | You are supposed to use open() and binmode() to manipulate the stack. | |
365 | ||
39f7a870 JH |
366 | B<Implementation details follow, please close your eyes.> |
367 | ||
9ec269cb | 368 | The arguments to layers are by default returned in parentheses after |
57fb4502 | 369 | the name of the layer, and certain layers (like C<:utf8>) are not real |
9ec269cb SL |
370 | layers but instead flags on real layers; to get all of these returned |
371 | separately, use the optional C<details> argument: | |
39f7a870 | 372 | |
2ae85e59 | 373 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); |
39f7a870 JH |
374 | |
375 | The result will be up to be three times the number of layers: | |
376 | the first element will be a name, the second element the arguments | |
377 | (unspecified arguments will be C<undef>), the third element the flags, | |
378 | the fourth element a name again, and so forth. | |
379 | ||
380 | B<You may open your eyes now.> | |
381 | ||
7d3b96bb NIS |
382 | =head1 AUTHOR |
383 | ||
384 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> | |
385 | ||
386 | =head1 SEE ALSO | |
387 | ||
39f7a870 JH |
388 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, |
389 | L<Encode> | |
7d3b96bb NIS |
390 | |
391 | =cut |