This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Promote v5.36 usage and feature bundles doc
[perl5.git] / lib / PerlIO.pm
CommitLineData
1141d9f8
NIS
1package PerlIO;
2
ce45800f 3our $VERSION = '1.12';
8de1277c 4
1141d9f8 5# Map layer name to package that defines it
c1a61b17 6our %alias;
1141d9f8
NIS
7
8sub import
9{
10 my $class = shift;
11 while (@_)
12 {
13 my $layer = shift;
14 if (exists $alias{$layer})
15 {
16 $layer = $alias{$layer}
17 }
18 else
19 {
20 $layer = "${class}::$layer";
21 }
c7996136 22 eval { require $layer =~ s{::}{/}gr . '.pm' };
1141d9f8
NIS
23 warn $@ if $@;
24 }
25}
26
39f7a870
JH
27sub F_UTF8 () { 0x8000 }
28
1141d9f8
NIS
291;
30__END__
b3d30bf7
NIS
31
32=head1 NAME
33
7d3b96bb 34PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
b3d30bf7
NIS
35
36=head1 SYNOPSIS
37
57fb4502
DB
38 # support platform-native and CRLF text files
39 open(my $fh, "<:crlf", "my.txt") or die "open failed: $!";
1cbfc93d 40
57fb4502
DB
41 # append UTF-8 encoded text
42 open(my $fh, ">>:encoding(UTF-8)", "some.log")
43 or die "open failed: $!";
44
45 # portably open a binary file for reading
46 open(my $fh, "<", "his.jpg") or die "open failed: $!";
47 binmode($fh) or die "binmode failed: $!";
7d3b96bb
NIS
48
49 Shell:
57fb4502 50 PERLIO=:perlio perl ....
b3d30bf7
NIS
51
52=head1 DESCRIPTION
53
ec28694c
JH
54When an undefined layer 'foo' is encountered in an C<open> or
55C<binmode> layer specification then C code performs the equivalent of:
b3d30bf7
NIS
56
57 use PerlIO 'foo';
58
57fb4502 59The Perl code in PerlIO.pm then attempts to locate a layer by doing
b3d30bf7
NIS
60
61 require PerlIO::foo;
62
47bfe92f
JH
63Otherwise the C<PerlIO> package is a place holder for additional
64PerlIO related functions.
b3d30bf7 65
57fb4502
DB
66=head2 Layers
67
68Generally speaking, PerlIO layers (previously sometimes referred to as
69"disciplines") are an ordered stack applied to a filehandle (specified as
70a space- or colon-separated list, conventionally written with a leading
71colon). Each layer performs some operation on any input or output, except
72when bypassed such as with C<sysread> or C<syswrite>. Read operations go
73through the stack in the order they are set (left to right), and write
74operations in the reverse order.
75
76There are also layers which actually just set flags on lower layers, or
77layers that modify the current stack but don't persist on the stack
78themselves; these are referred to as pseudo-layers.
79
80When opening a handle, it will be opened with any layers specified
81explicitly in the open() call (or the platform defaults, if specified as
82a colon with no following layers).
83
84If layers are not explicitly specified, the handle will be opened with the
85layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually
86set by using the L<open> pragma for a lexical scope, or the C<-C>
87command-line switch or C<PERL_UNICODE> environment variable for the main
88program scope).
89
90If layers are not specified in the open() call or C<${^OPEN}> variable,
91the handle will be opened with the default layer stack configured for that
92architecture; see L</"Defaults and how to override them">.
93
94Some layers will automatically insert required lower level layers if not
95present; for example C<:perlio> will insert C<:unix> below itself for low
96level IO, and C<:encoding> will insert the platform defaults for buffered
97IO.
98
99The C<binmode> function can be called on an opened handle to push
100additional layers onto the stack, which may also modify the existing
101layers. C<binmode> called with no layers will remove or unset any
102existing layers which transform the byte stream, making the handle
103suitable for binary data.
104
7d3b96bb 105The following layers are currently defined:
b3d30bf7 106
7d3b96bb
NIS
107=over 4
108
3d897973 109=item :unix
7d3b96bb 110
3d897973
IT
111Lowest level layer which provides basic PerlIO operations in terms of
112UNIX/POSIX numeric file descriptor calls
113(open(), read(), write(), lseek(), close()).
57fb4502
DB
114It is used even on non-Unix architectures, and most other layers operate on
115top of it.
7d3b96bb 116
3d897973 117=item :stdio
7d3b96bb 118
47bfe92f
JH
119Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note
120that as this is "real" stdio it will ignore any layers beneath it and
9ec269cb 121go straight to the operating system via the C library as usual.
57fb4502
DB
122This layer implements both low level IO and buffering, but is rarely used
123on modern architectures.
7d3b96bb 124
3d897973 125=item :perlio
7d3b96bb 126
3d897973 127A from scratch implementation of buffering for PerlIO. Provides fast
57fb4502 128access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt>
3d897973 129and in general attempts to minimize data copying.
7d3b96bb 130
3d897973 131C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
7d3b96bb 132
3d897973 133=item :crlf
7d3b96bb 134
3d897973
IT
135A layer that implements DOS/Windows like CRLF line endings. On read
136converts pairs of CR,LF to a single "\n" newline character. On write
8dcd593c
LT
137converts each "\n" to a CR,LF pair. Note that this layer will silently
138refuse to be pushed on top of itself.
3d897973
IT
139
140It currently does I<not> mimic MS-DOS as far as treating of Control-Z
141as being an end-of-file marker.
142
57fb4502
DB
143On DOS/Windows like architectures where this layer is part of the defaults,
144it also acts like the C<:perlio> layer, and removing the CRLF translation
145(such as with C<:raw>) will only unset the CRLF translation flag. Since
146Perl 5.14, you can also apply another C<:crlf> layer later, such as when
147the CRLF translation must occur after an encoding layer. On other
148architectures, it is a mundane CRLF translation layer and can be added and
149removed normally.
47bfe92f 150
57fb4502
DB
151 # translate CRLF after encoding on Perl 5.14 or newer
152 binmode $fh, ":raw:encoding(UTF-16LE):crlf"
153 or die "binmode failed: $!";
47bfe92f 154
57fb4502 155=item :utf8
7d3b96bb 156
57fb4502
DB
157Pseudo-layer that declares that the stream accepts Perl's I<internal>
158upgraded encoding of characters, which is approximately UTF-8 on ASCII
159machines, but UTF-EBCDIC on EBCDIC machines. This allows any character
160Perl can represent to be read from or written to the stream.
161
162This layer (which actually sets a flag on the preceding layer, and is
163implicitly set by any C<:encoding> layer) does not translate or validate
164byte sequences. It instead indicates that the byte stream will have been
165arranged by other layers to be provided in Perl's internal upgraded
166encoding, which Perl code (and correctly written XS code) will interpret
167as decoded Unicode characters.
168
169B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as
170invalid UTF-8 or binary data will result in malformed Perl strings. It is
171unlikely to produce invalid UTF-8 when used for output, though it will
172instead produce UTF-EBCDIC on EBCDIC systems. The C<:encoding(UTF-8)>
173layer (hyphen is significant) is preferred as it will ensure translation
174between valid UTF-8 bytes and valid Unicode characters.
740d4bb2 175
3d897973 176=item :bytes
c1a61b17 177
57fb4502 178This is the inverse of the C<:utf8> pseudo-layer. It turns off the flag
c1a61b17 179on the layer below so that data read from it is considered to
57fb4502
DB
180be Perl's internal downgraded encoding, thus interpreted as the native
181single-byte encoding of Latin-1 or EBCDIC. Likewise on output Perl will
182warn if a "wide" character (a codepoint not in the range 0..255) is
183written to a such a stream.
184
185This is very dangerous to push on a handle using an C<:encoding> layer,
186as such a layer assumes to be working with Perl's internal upgraded
187encoding, so you will likely get a mangled result. Instead use C<:raw> or
188C<:pop> to remove encoding layers.
c1a61b17 189
3d897973 190=item :raw
7d3b96bb 191
57fb4502 192The C<:raw> pseudo-layer is I<defined> as being identical to calling
9ec269cb 193C<binmode($fh)> - the stream is made suitable for passing binary data,
57fb4502
DB
194i.e. each byte is passed as-is. The stream will still be buffered
195(but this was not always true before Perl 5.14).
3d897973 196
57fb4502
DB
197In Perl 5.6 and some books the C<:raw> layer is documented as the inverse
198of the C<:crlf> layer. That is no longer the case - other layers which
199would alter the binary nature of the stream are also disabled. If you
200want UNIX line endings on a platform that normally does CRLF translation,
201but still want UTF-8 or encoding defaults, the appropriate thing to do is
202to add C<:perlio> to the PERLIO environment variable, or open the handle
203explicitly with that layer, to replace the platform default of C<:crlf>.
1cbfc93d 204
0226bbdb 205The implementation of C<:raw> is as a pseudo-layer which when "pushed"
57fb4502
DB
206pops itself and then any layers which would modify the binary data stream.
207(Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags
208rather than popping layers but that is an implementation detail.)
01e6739c 209
9ec269cb 210As a consequence of the fact that C<:raw> normally pops layers,
39f7a870
JH
211it usually only makes sense to have it as the only or first element in
212a layer specification. When used as the first element it provides
0226bbdb 213a known base on which to build e.g.
7d3b96bb 214
57fb4502
DB
215 open(my $fh,">:raw:encoding(UTF-8)",...)
216 or die "open failed: $!";
7d3b96bb 217
57fb4502
DB
218will construct a "binary" stream regardless of the platform defaults,
219but then enable UTF-8 translation.
b3d30bf7 220
3d897973 221=item :pop
4ec2216f 222
57fb4502 223A pseudo-layer that removes the top-most layer. Gives Perl code a
8a7bc862 224way to manipulate the layer stack. Note that C<:pop> only works on
57fb4502
DB
225real layers and will not undo the effects of pseudo-layers or flags
226like C<:utf8>. An example of a possible use might be:
4ec2216f 227
57fb4502 228 open(my $fh,...) or die "open failed: $!";
4ec2216f 229 ...
57fb4502
DB
230 binmode($fh,":encoding(...)") or die "binmode failed: $!";
231 # next chunk is encoded
4ec2216f 232 ...
57fb4502
DB
233 binmode($fh,":pop") or die "binmode failed: $!";
234 # back to un-encoded
4ec2216f
NIS
235
236A more elegant (and safer) interface is needed.
237
7d3b96bb
NIS
238=back
239
39f7a870
JH
240=head2 Custom Layers
241
242It is possible to write custom layers in addition to the above builtin
57fb4502
DB
243ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>.
244Some custom layers come with the Perl distribution.
39f7a870
JH
245
246=over 4
247
248=item :encoding
249
57fb4502
DB
250Use C<:encoding(ENCODING)> to transparently do character set and encoding
251transformations, for example from Shift-JIS to Unicode. Note that an
252C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> for more
253information.
39f7a870 254
307764ab
LT
255=item :mmap
256
257A layer which implements "reading" of files by using C<mmap()> to
258make a (whole) file appear in the process's address space, and then
259using that as PerlIO's "buffer". This I<may> be faster in certain
260circumstances for large files, and may result in less physical memory
261use when multiple processes are reading the same file.
262
263Files which are not C<mmap()>-able revert to behaving like the C<:perlio>
264layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write
265needs extra house-keeping (to extend the file) which negates any advantage.
266
267The C<:mmap> layer will not exist if the platform does not support C<mmap()>.
57fb4502 268See L<PerlIO::mmap> for more information.
307764ab 269
39f7a870
JH
270=item :via
271
57fb4502
DB
272C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl
273module, for example compression / decompression, encryption / decryption.
39f7a870
JH
274See L<PerlIO::via> for more information.
275
57fb4502
DB
276=item :scalar
277
278A layer implementing "in memory" files using scalar variables,
279automatically used in place of the platform defaults for IO when opening
280such a handle. As such, the scalar is expected to act like a file, only
281containing or storing bytes. See L<PerlIO::scalar> for more information.
282
39f7a870
JH
283=back
284
01e6739c
NIS
285=head2 Alternatives to raw
286
0226bbdb 287To get a binary stream an alternate method is to use:
01e6739c 288
57fb4502
DB
289 open(my $fh,"<","whatever") or die "open failed: $!";
290 binmode($fh) or die "binmode failed: $!";
01e6739c 291
57fb4502
DB
292This has the advantage of being backward compatible with older versions
293of Perl that did not use PerlIO or where C<:raw> was buggy (as it was
294before Perl 5.14).
01e6739c 295
9ec269cb 296To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>)
0226bbdb 297in the open call:
01e6739c 298
57fb4502 299 open(my $fh,"<:unix",$path) or die "open failed: $!";
01e6739c 300
7d3b96bb
NIS
301=head2 Defaults and how to override them
302
ec28694c 303If the platform is MS-DOS like and normally does CRLF to "\n"
57fb4502 304translation for text files then the default layers are:
7d3b96bb 305
57fb4502 306 :unix:crlf
7d3b96bb 307
9ec269cb 308Otherwise if C<Configure> found out how to do "fast" IO using the system's
57fb4502 309stdio (not common on modern architectures), then the default layers are:
7d3b96bb 310
57fb4502 311 :stdio
7d3b96bb
NIS
312
313Otherwise the default layers are
314
57fb4502 315 :unix:perlio
7d3b96bb 316
57fb4502
DB
317Note that the "default stack" depends on the operating system and on the
318Perl version, and both the compile-time and runtime configurations of
319Perl. The default can be overridden by setting the environment variable
320PERLIO to a space or colon separated list of layers, however this cannot
321be used to set layers that require loading modules like C<:encoding>.
47bfe92f 322
7d3b96bb
NIS
323This can be used to see the effect of/bugs in the various layers e.g.
324
325 cd .../perl/t
57fb4502
DB
326 PERLIO=:stdio ./perl harness
327 PERLIO=:perlio ./perl harness
7d3b96bb 328
9ec269cb 329For the various values of PERLIO see L<perlrun/PERLIO>.
3b0db4f9 330
57fb4502
DB
331The following table summarizes the default layers on UNIX-like and
332DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>:
333
334 PERLIO UNIX-like DOS-like
335 ------ --------- --------
336 unset / "" :unix:perlio / :stdio [1] :unix:crlf
337 :stdio :stdio :stdio
338 :perlio :unix:perlio :unix:perlio
339
340 # [1] ":stdio" if Configure found out how to do "fast stdio" (depends
341 # on the stdio implementation) and in Perl 5.8, else ":unix:perlio"
342
4c11337c 343=head2 Querying the layers of filehandles
39f7a870
JH
344
345The following returns the B<names> of the PerlIO layers on a filehandle.
346
9d569fce 347 my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
39f7a870
JH
348
349The layers are returned in the order an open() or binmode() call would
57fb4502 350use them, and without colons.
046e4a6a 351
9ec269cb
SL
352By default the layers from the input side of the filehandle are
353returned; to get the output side, use the optional C<output> argument:
39f7a870 354
2ae85e59 355 my @layers = PerlIO::get_layers($fh, output => 1);
39f7a870
JH
356
357(Usually the layers are identical on either side of a filehandle but
57fb4502 358for example with sockets there may be differences.)
39f7a870 359
92a3e63c
JH
360There is no set_layers(), nor does get_layers() return a tied array
361mirroring the stack, or anything fancy like that. This is not
362accidental or unintentional. The PerlIO layer stack is a bit more
363complicated than just a stack (see for example the behaviour of C<:raw>).
364You are supposed to use open() and binmode() to manipulate the stack.
365
39f7a870
JH
366B<Implementation details follow, please close your eyes.>
367
9ec269cb 368The arguments to layers are by default returned in parentheses after
57fb4502 369the name of the layer, and certain layers (like C<:utf8>) are not real
9ec269cb
SL
370layers but instead flags on real layers; to get all of these returned
371separately, use the optional C<details> argument:
39f7a870 372
2ae85e59 373 my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
39f7a870
JH
374
375The result will be up to be three times the number of layers:
376the first element will be a name, the second element the arguments
377(unspecified arguments will be C<undef>), the third element the flags,
378the fourth element a name again, and so forth.
379
380B<You may open your eyes now.>
381
7d3b96bb
NIS
382=head1 AUTHOR
383
384Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt>
385
386=head1 SEE ALSO
387
39f7a870
JH
388L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
389L<Encode>
7d3b96bb
NIS
390
391=cut