Commit | Line | Data |
---|---|---|
1141d9f8 NIS |
1 | package PerlIO; |
2 | ||
78269f09 | 3 | our $VERSION = '1.09'; |
8de1277c | 4 | |
1141d9f8 | 5 | # Map layer name to package that defines it |
c1a61b17 | 6 | our %alias; |
1141d9f8 NIS |
7 | |
8 | sub import | |
9 | { | |
10 | my $class = shift; | |
11 | while (@_) | |
12 | { | |
13 | my $layer = shift; | |
14 | if (exists $alias{$layer}) | |
15 | { | |
16 | $layer = $alias{$layer} | |
17 | } | |
18 | else | |
19 | { | |
20 | $layer = "${class}::$layer"; | |
21 | } | |
c7996136 | 22 | eval { require $layer =~ s{::}{/}gr . '.pm' }; |
1141d9f8 NIS |
23 | warn $@ if $@; |
24 | } | |
25 | } | |
26 | ||
39f7a870 JH |
27 | sub F_UTF8 () { 0x8000 } |
28 | ||
1141d9f8 NIS |
29 | 1; |
30 | __END__ | |
b3d30bf7 NIS |
31 | |
32 | =head1 NAME | |
33 | ||
7d3b96bb | 34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space |
b3d30bf7 NIS |
35 | |
36 | =head1 SYNOPSIS | |
37 | ||
555bd962 BG |
38 | open($fh, "<:crlf", "my.txt"); # support platform-native and |
39 | # CRLF text files | |
1cbfc93d | 40 | |
555bd962 | 41 | open($fh, "<", "his.jpg"); # portably open a binary file for reading |
1cbfc93d | 42 | binmode($fh); |
7d3b96bb NIS |
43 | |
44 | Shell: | |
45 | PERLIO=perlio perl .... | |
b3d30bf7 NIS |
46 | |
47 | =head1 DESCRIPTION | |
48 | ||
ec28694c JH |
49 | When an undefined layer 'foo' is encountered in an C<open> or |
50 | C<binmode> layer specification then C code performs the equivalent of: | |
b3d30bf7 NIS |
51 | |
52 | use PerlIO 'foo'; | |
53 | ||
54 | The perl code in PerlIO.pm then attempts to locate a layer by doing | |
55 | ||
56 | require PerlIO::foo; | |
57 | ||
47bfe92f JH |
58 | Otherwise the C<PerlIO> package is a place holder for additional |
59 | PerlIO related functions. | |
b3d30bf7 | 60 | |
7d3b96bb | 61 | The following layers are currently defined: |
b3d30bf7 | 62 | |
7d3b96bb NIS |
63 | =over 4 |
64 | ||
3d897973 | 65 | =item :unix |
7d3b96bb | 66 | |
3d897973 IT |
67 | Lowest level layer which provides basic PerlIO operations in terms of |
68 | UNIX/POSIX numeric file descriptor calls | |
69 | (open(), read(), write(), lseek(), close()). | |
7d3b96bb | 70 | |
3d897973 | 71 | =item :stdio |
7d3b96bb | 72 | |
47bfe92f JH |
73 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note |
74 | that as this is "real" stdio it will ignore any layers beneath it and | |
9ec269cb | 75 | go straight to the operating system via the C library as usual. |
7d3b96bb | 76 | |
3d897973 | 77 | =item :perlio |
7d3b96bb | 78 | |
3d897973 IT |
79 | A from scratch implementation of buffering for PerlIO. Provides fast |
80 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> | |
81 | and in general attempts to minimize data copying. | |
7d3b96bb | 82 | |
3d897973 | 83 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. |
7d3b96bb | 84 | |
3d897973 | 85 | =item :crlf |
7d3b96bb | 86 | |
3d897973 IT |
87 | A layer that implements DOS/Windows like CRLF line endings. On read |
88 | converts pairs of CR,LF to a single "\n" newline character. On write | |
8dcd593c LT |
89 | converts each "\n" to a CR,LF pair. Note that this layer will silently |
90 | refuse to be pushed on top of itself. | |
3d897973 IT |
91 | |
92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z | |
93 | as being an end-of-file marker. | |
94 | ||
3d897973 IT |
95 | Based on the C<:perlio> layer. |
96 | ||
3d897973 | 97 | =item :utf8 |
7d3b96bb | 98 | |
2575c402 | 99 | Declares that the stream accepts perl's I<internal> encoding of |
47bfe92f JH |
100 | characters. (Which really is UTF-8 on ASCII machines, but is |
101 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can | |
102 | represent to be read from or written to the stream. The UTF-X encoding | |
103 | is chosen to render simple text parts (i.e. non-accented letters, | |
104 | digits and common punctuation) human readable in the encoded file. | |
105 | ||
78269f09 KW |
106 | (B<CAUTION>: This layer does not validate byte sequences. For reading input, |
107 | you should instead use C<:encoding(utf8)> instead of bare C<:utf8>.) | |
108 | ||
47bfe92f JH |
109 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) |
110 | and then read it back in. | |
111 | ||
112 | open(F, ">:utf8", "data.utf"); | |
113 | print F $out; | |
114 | close(F); | |
115 | ||
116 | open(F, "<:utf8", "data.utf"); | |
117 | $in = <F>; | |
118 | close(F); | |
7d3b96bb | 119 | |
740d4bb2 | 120 | |
3d897973 | 121 | =item :bytes |
c1a61b17 | 122 | |
9ec269cb | 123 | This is the inverse of the C<:utf8> layer. It turns off the flag |
c1a61b17 | 124 | on the layer below so that data read from it is considered to |
9ec269cb | 125 | be "octets" i.e. characters in the range 0..255 only. Likewise |
c1a61b17 NIS |
126 | on output perl will warn if a "wide" character is written |
127 | to a such a stream. | |
128 | ||
3d897973 | 129 | =item :raw |
7d3b96bb | 130 | |
0226bbdb | 131 | The C<:raw> layer is I<defined> as being identical to calling |
9ec269cb | 132 | C<binmode($fh)> - the stream is made suitable for passing binary data, |
18aba96f | 133 | i.e. each byte is passed as-is. The stream will still be |
3d897973 IT |
134 | buffered. |
135 | ||
136 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also | |
137 | referred to as a "discipline") is documented as the inverse of the | |
138 | C<:crlf> layer. That is no longer the case - other layers which would | |
9ec269cb | 139 | alter the binary nature of the stream are also disabled. If you want UNIX |
3d897973 | 140 | line endings on a platform that normally does CRLF translation, but still |
9ec269cb SL |
141 | want UTF-8 or encoding defaults, the appropriate thing to do is to add |
142 | C<:perlio> to the PERLIO environment variable. | |
1cbfc93d | 143 | |
0226bbdb NIS |
144 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" |
145 | pops itself and then any layers which do not declare themselves as suitable | |
146 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing | |
39f7a870 | 147 | flags rather than popping layers but that is an implementation detail.) |
01e6739c | 148 | |
9ec269cb | 149 | As a consequence of the fact that C<:raw> normally pops layers, |
39f7a870 JH |
150 | it usually only makes sense to have it as the only or first element in |
151 | a layer specification. When used as the first element it provides | |
0226bbdb | 152 | a known base on which to build e.g. |
7d3b96bb | 153 | |
0226bbdb | 154 | open($fh,":raw:utf8",...) |
7d3b96bb | 155 | |
0226bbdb | 156 | will construct a "binary" stream, but then enable UTF-8 translation. |
b3d30bf7 | 157 | |
3d897973 | 158 | =item :pop |
4ec2216f | 159 | |
8a7bc862 RS |
160 | A pseudo layer that removes the top-most layer. Gives perl code a |
161 | way to manipulate the layer stack. Note that C<:pop> only works on | |
162 | real layers and will not undo the effects of pseudo layers like | |
163 | C<:utf8>. An example of a possible use might be: | |
4ec2216f NIS |
164 | |
165 | open($fh,...) | |
166 | ... | |
167 | binmode($fh,":encoding(...)"); # next chunk is encoded | |
168 | ... | |
3c4b39be | 169 | binmode($fh,":pop"); # back to un-encoded |
4ec2216f NIS |
170 | |
171 | A more elegant (and safer) interface is needed. | |
172 | ||
3d897973 IT |
173 | =item :win32 |
174 | ||
9ec269cb SL |
175 | On Win32 platforms this I<experimental> layer uses the native "handle" IO |
176 | rather than the unix-like numeric file descriptor layer. Known to be | |
3d897973 IT |
177 | buggy as of perl 5.8.2. |
178 | ||
7d3b96bb NIS |
179 | =back |
180 | ||
39f7a870 JH |
181 | =head2 Custom Layers |
182 | ||
183 | It is possible to write custom layers in addition to the above builtin | |
184 | ones, both in C/XS and Perl. Two such layers (and one example written | |
185 | in Perl using the latter) come with the Perl distribution. | |
186 | ||
187 | =over 4 | |
188 | ||
189 | =item :encoding | |
190 | ||
191 | Use C<:encoding(ENCODING)> either in open() or binmode() to install | |
9ec269cb | 192 | a layer that transparently does character set and encoding transformations, |
e76300d6 JH |
193 | for example from Shift-JIS to Unicode. Note that under C<stdio> |
194 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> | |
195 | for more information. | |
39f7a870 | 196 | |
307764ab LT |
197 | =item :mmap |
198 | ||
199 | A layer which implements "reading" of files by using C<mmap()> to | |
200 | make a (whole) file appear in the process's address space, and then | |
201 | using that as PerlIO's "buffer". This I<may> be faster in certain | |
202 | circumstances for large files, and may result in less physical memory | |
203 | use when multiple processes are reading the same file. | |
204 | ||
205 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> | |
206 | layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write | |
207 | needs extra house-keeping (to extend the file) which negates any advantage. | |
208 | ||
209 | The C<:mmap> layer will not exist if the platform does not support C<mmap()>. | |
210 | ||
39f7a870 JH |
211 | =item :via |
212 | ||
213 | Use C<:via(MODULE)> either in open() or binmode() to install a layer | |
214 | that does whatever transformation (for example compression / | |
215 | decompression, encryption / decryption) to the filehandle. | |
216 | See L<PerlIO::via> for more information. | |
217 | ||
218 | =back | |
219 | ||
01e6739c NIS |
220 | =head2 Alternatives to raw |
221 | ||
0226bbdb | 222 | To get a binary stream an alternate method is to use: |
01e6739c | 223 | |
0226bbdb | 224 | open($fh,"whatever") |
01e6739c NIS |
225 | binmode($fh); |
226 | ||
9ec269cb | 227 | this has the advantage of being backward compatible with how such things have |
01e6739c | 228 | had to be coded on some platforms for years. |
01e6739c | 229 | |
9ec269cb | 230 | To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) |
0226bbdb | 231 | in the open call: |
01e6739c NIS |
232 | |
233 | open($fh,"<:unix",$path) | |
234 | ||
7d3b96bb NIS |
235 | =head2 Defaults and how to override them |
236 | ||
ec28694c JH |
237 | If the platform is MS-DOS like and normally does CRLF to "\n" |
238 | translation for text files then the default layers are : | |
7d3b96bb NIS |
239 | |
240 | unix crlf | |
241 | ||
47bfe92f JH |
242 | (The low level "unix" layer may be replaced by a platform specific low |
243 | level layer.) | |
7d3b96bb | 244 | |
9ec269cb | 245 | Otherwise if C<Configure> found out how to do "fast" IO using the system's |
046e4a6a | 246 | stdio, then the default layers are: |
7d3b96bb NIS |
247 | |
248 | unix stdio | |
249 | ||
250 | Otherwise the default layers are | |
251 | ||
252 | unix perlio | |
253 | ||
254 | These defaults may change once perlio has been better tested and tuned. | |
255 | ||
47bfe92f | 256 | The default can be overridden by setting the environment variable |
39f7a870 JH |
257 | PERLIO to a space separated list of layers (C<unix> or platform low |
258 | level layer is always pushed first). | |
47bfe92f | 259 | |
7d3b96bb NIS |
260 | This can be used to see the effect of/bugs in the various layers e.g. |
261 | ||
262 | cd .../perl/t | |
263 | PERLIO=stdio ./perl harness | |
264 | PERLIO=perlio ./perl harness | |
265 | ||
9ec269cb | 266 | For the various values of PERLIO see L<perlrun/PERLIO>. |
3b0db4f9 | 267 | |
4c11337c | 268 | =head2 Querying the layers of filehandles |
39f7a870 JH |
269 | |
270 | The following returns the B<names> of the PerlIO layers on a filehandle. | |
271 | ||
9d569fce | 272 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". |
39f7a870 JH |
273 | |
274 | The layers are returned in the order an open() or binmode() call would | |
f0fd62e2 | 275 | use them. Note that the "default stack" depends on the operating |
cc83745d JH |
276 | system and on the Perl version, and both the compile-time and |
277 | runtime configurations of Perl. | |
79d9a4d7 | 278 | |
79d9a4d7 | 279 | The following table summarizes the default layers on UNIX-like and |
9ec269cb | 280 | DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: |
79d9a4d7 | 281 | |
f0fd62e2 | 282 | PERLIO UNIX-like DOS-like |
a7845df8 | 283 | ------ --------- -------- |
f0fd62e2 JH |
284 | unset / "" unix perlio / stdio [1] unix crlf |
285 | stdio unix perlio / stdio [1] stdio | |
286 | perlio unix perlio unix perlio | |
39f7a870 | 287 | |
f0fd62e2 JH |
288 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends |
289 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" | |
046e4a6a | 290 | |
9ec269cb SL |
291 | By default the layers from the input side of the filehandle are |
292 | returned; to get the output side, use the optional C<output> argument: | |
39f7a870 | 293 | |
2ae85e59 | 294 | my @layers = PerlIO::get_layers($fh, output => 1); |
39f7a870 JH |
295 | |
296 | (Usually the layers are identical on either side of a filehandle but | |
2ae85e59 JH |
297 | for example with sockets there may be differences, or if you have |
298 | been using the C<open> pragma.) | |
39f7a870 | 299 | |
92a3e63c JH |
300 | There is no set_layers(), nor does get_layers() return a tied array |
301 | mirroring the stack, or anything fancy like that. This is not | |
302 | accidental or unintentional. The PerlIO layer stack is a bit more | |
303 | complicated than just a stack (see for example the behaviour of C<:raw>). | |
304 | You are supposed to use open() and binmode() to manipulate the stack. | |
305 | ||
39f7a870 JH |
306 | B<Implementation details follow, please close your eyes.> |
307 | ||
9ec269cb | 308 | The arguments to layers are by default returned in parentheses after |
39f7a870 | 309 | the name of the layer, and certain layers (like C<utf8>) are not real |
9ec269cb SL |
310 | layers but instead flags on real layers; to get all of these returned |
311 | separately, use the optional C<details> argument: | |
39f7a870 | 312 | |
2ae85e59 | 313 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); |
39f7a870 JH |
314 | |
315 | The result will be up to be three times the number of layers: | |
316 | the first element will be a name, the second element the arguments | |
317 | (unspecified arguments will be C<undef>), the third element the flags, | |
318 | the fourth element a name again, and so forth. | |
319 | ||
320 | B<You may open your eyes now.> | |
321 | ||
7d3b96bb NIS |
322 | =head1 AUTHOR |
323 | ||
324 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> | |
325 | ||
326 | =head1 SEE ALSO | |
327 | ||
39f7a870 JH |
328 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, |
329 | L<Encode> | |
7d3b96bb NIS |
330 | |
331 | =cut |