| 1 | package PerlIO; |
| 2 | |
| 3 | our $VERSION = '1.12'; |
| 4 | |
| 5 | # Map layer name to package that defines it |
| 6 | our %alias; |
| 7 | |
| 8 | sub import |
| 9 | { |
| 10 | my $class = shift; |
| 11 | while (@_) |
| 12 | { |
| 13 | my $layer = shift; |
| 14 | if (exists $alias{$layer}) |
| 15 | { |
| 16 | $layer = $alias{$layer} |
| 17 | } |
| 18 | else |
| 19 | { |
| 20 | $layer = "${class}::$layer"; |
| 21 | } |
| 22 | eval { require $layer =~ s{::}{/}gr . '.pm' }; |
| 23 | warn $@ if $@; |
| 24 | } |
| 25 | } |
| 26 | |
| 27 | sub F_UTF8 () { 0x8000 } |
| 28 | |
| 29 | 1; |
| 30 | __END__ |
| 31 | |
| 32 | =head1 NAME |
| 33 | |
| 34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space |
| 35 | |
| 36 | =head1 SYNOPSIS |
| 37 | |
| 38 | # support platform-native and CRLF text files |
| 39 | open(my $fh, "<:crlf", "my.txt") or die "open failed: $!"; |
| 40 | |
| 41 | # append UTF-8 encoded text |
| 42 | open(my $fh, ">>:encoding(UTF-8)", "some.log") |
| 43 | or die "open failed: $!"; |
| 44 | |
| 45 | # portably open a binary file for reading |
| 46 | open(my $fh, "<", "his.jpg") or die "open failed: $!"; |
| 47 | binmode($fh) or die "binmode failed: $!"; |
| 48 | |
| 49 | Shell: |
| 50 | PERLIO=:perlio perl .... |
| 51 | |
| 52 | =head1 DESCRIPTION |
| 53 | |
| 54 | When an undefined layer 'foo' is encountered in an C<open> or |
| 55 | C<binmode> layer specification then C code performs the equivalent of: |
| 56 | |
| 57 | use PerlIO 'foo'; |
| 58 | |
| 59 | The Perl code in PerlIO.pm then attempts to locate a layer by doing |
| 60 | |
| 61 | require PerlIO::foo; |
| 62 | |
| 63 | Otherwise the C<PerlIO> package is a place holder for additional |
| 64 | PerlIO related functions. |
| 65 | |
| 66 | =head2 Layers |
| 67 | |
| 68 | Generally speaking, PerlIO layers (previously sometimes referred to as |
| 69 | "disciplines") are an ordered stack applied to a filehandle (specified as |
| 70 | a space- or colon-separated list, conventionally written with a leading |
| 71 | colon). Each layer performs some operation on any input or output, except |
| 72 | when bypassed such as with C<sysread> or C<syswrite>. Read operations go |
| 73 | through the stack in the order they are set (left to right), and write |
| 74 | operations in the reverse order. |
| 75 | |
| 76 | There are also layers which actually just set flags on lower layers, or |
| 77 | layers that modify the current stack but don't persist on the stack |
| 78 | themselves; these are referred to as pseudo-layers. |
| 79 | |
| 80 | When opening a handle, it will be opened with any layers specified |
| 81 | explicitly in the open() call (or the platform defaults, if specified as |
| 82 | a colon with no following layers). |
| 83 | |
| 84 | If layers are not explicitly specified, the handle will be opened with the |
| 85 | layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually |
| 86 | set by using the L<open> pragma for a lexical scope, or the C<-C> |
| 87 | command-line switch or C<PERL_UNICODE> environment variable for the main |
| 88 | program scope). |
| 89 | |
| 90 | If layers are not specified in the open() call or C<${^OPEN}> variable, |
| 91 | the handle will be opened with the default layer stack configured for that |
| 92 | architecture; see L</"Defaults and how to override them">. |
| 93 | |
| 94 | Some layers will automatically insert required lower level layers if not |
| 95 | present; for example C<:perlio> will insert C<:unix> below itself for low |
| 96 | level IO, and C<:encoding> will insert the platform defaults for buffered |
| 97 | IO. |
| 98 | |
| 99 | The C<binmode> function can be called on an opened handle to push |
| 100 | additional layers onto the stack, which may also modify the existing |
| 101 | layers. C<binmode> called with no layers will remove or unset any |
| 102 | existing layers which transform the byte stream, making the handle |
| 103 | suitable for binary data. |
| 104 | |
| 105 | The following layers are currently defined: |
| 106 | |
| 107 | =over 4 |
| 108 | |
| 109 | =item :unix |
| 110 | |
| 111 | Lowest level layer which provides basic PerlIO operations in terms of |
| 112 | UNIX/POSIX numeric file descriptor calls |
| 113 | (open(), read(), write(), lseek(), close()). |
| 114 | It is used even on non-Unix architectures, and most other layers operate on |
| 115 | top of it. |
| 116 | |
| 117 | =item :stdio |
| 118 | |
| 119 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note |
| 120 | that as this is "real" stdio it will ignore any layers beneath it and |
| 121 | go straight to the operating system via the C library as usual. |
| 122 | This layer implements both low level IO and buffering, but is rarely used |
| 123 | on modern architectures. |
| 124 | |
| 125 | =item :perlio |
| 126 | |
| 127 | A from scratch implementation of buffering for PerlIO. Provides fast |
| 128 | access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt> |
| 129 | and in general attempts to minimize data copying. |
| 130 | |
| 131 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. |
| 132 | |
| 133 | =item :crlf |
| 134 | |
| 135 | A layer that implements DOS/Windows like CRLF line endings. On read |
| 136 | converts pairs of CR,LF to a single "\n" newline character. On write |
| 137 | converts each "\n" to a CR,LF pair. Note that this layer will silently |
| 138 | refuse to be pushed on top of itself. |
| 139 | |
| 140 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z |
| 141 | as being an end-of-file marker. |
| 142 | |
| 143 | On DOS/Windows like architectures where this layer is part of the defaults, |
| 144 | it also acts like the C<:perlio> layer, and removing the CRLF translation |
| 145 | (such as with C<:raw>) will only unset the CRLF translation flag. Since |
| 146 | Perl 5.14, you can also apply another C<:crlf> layer later, such as when |
| 147 | the CRLF translation must occur after an encoding layer. On other |
| 148 | architectures, it is a mundane CRLF translation layer and can be added and |
| 149 | removed normally. |
| 150 | |
| 151 | # translate CRLF after encoding on Perl 5.14 or newer |
| 152 | binmode $fh, ":raw:encoding(UTF-16LE):crlf" |
| 153 | or die "binmode failed: $!"; |
| 154 | |
| 155 | =item :utf8 |
| 156 | |
| 157 | Pseudo-layer that declares that the stream accepts Perl's I<internal> |
| 158 | upgraded encoding of characters, which is approximately UTF-8 on ASCII |
| 159 | machines, but UTF-EBCDIC on EBCDIC machines. This allows any character |
| 160 | Perl can represent to be read from or written to the stream. |
| 161 | |
| 162 | This layer (which actually sets a flag on the preceding layer, and is |
| 163 | implicitly set by any C<:encoding> layer) does not translate or validate |
| 164 | byte sequences. It instead indicates that the byte stream will have been |
| 165 | arranged by other layers to be provided in Perl's internal upgraded |
| 166 | encoding, which Perl code (and correctly written XS code) will interpret |
| 167 | as decoded Unicode characters. |
| 168 | |
| 169 | B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as |
| 170 | invalid UTF-8 or binary data will result in malformed Perl strings. It is |
| 171 | unlikely to produce invalid UTF-8 when used for output, though it will |
| 172 | instead produce UTF-EBCDIC on EBCDIC systems. The C<:encoding(UTF-8)> |
| 173 | layer (hyphen is significant) is preferred as it will ensure translation |
| 174 | between valid UTF-8 bytes and valid Unicode characters. |
| 175 | |
| 176 | =item :bytes |
| 177 | |
| 178 | This is the inverse of the C<:utf8> pseudo-layer. It turns off the flag |
| 179 | on the layer below so that data read from it is considered to |
| 180 | be Perl's internal downgraded encoding, thus interpreted as the native |
| 181 | single-byte encoding of Latin-1 or EBCDIC. Likewise on output Perl will |
| 182 | warn if a "wide" character (a codepoint not in the range 0..255) is |
| 183 | written to a such a stream. |
| 184 | |
| 185 | This is very dangerous to push on a handle using an C<:encoding> layer, |
| 186 | as such a layer assumes to be working with Perl's internal upgraded |
| 187 | encoding, so you will likely get a mangled result. Instead use C<:raw> or |
| 188 | C<:pop> to remove encoding layers. |
| 189 | |
| 190 | =item :raw |
| 191 | |
| 192 | The C<:raw> pseudo-layer is I<defined> as being identical to calling |
| 193 | C<binmode($fh)> - the stream is made suitable for passing binary data, |
| 194 | i.e. each byte is passed as-is. The stream will still be buffered |
| 195 | (but this was not always true before Perl 5.14). |
| 196 | |
| 197 | In Perl 5.6 and some books the C<:raw> layer is documented as the inverse |
| 198 | of the C<:crlf> layer. That is no longer the case - other layers which |
| 199 | would alter the binary nature of the stream are also disabled. If you |
| 200 | want UNIX line endings on a platform that normally does CRLF translation, |
| 201 | but still want UTF-8 or encoding defaults, the appropriate thing to do is |
| 202 | to add C<:perlio> to the PERLIO environment variable, or open the handle |
| 203 | explicitly with that layer, to replace the platform default of C<:crlf>. |
| 204 | |
| 205 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" |
| 206 | pops itself and then any layers which would modify the binary data stream. |
| 207 | (Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags |
| 208 | rather than popping layers but that is an implementation detail.) |
| 209 | |
| 210 | As a consequence of the fact that C<:raw> normally pops layers, |
| 211 | it usually only makes sense to have it as the only or first element in |
| 212 | a layer specification. When used as the first element it provides |
| 213 | a known base on which to build e.g. |
| 214 | |
| 215 | open(my $fh,">:raw:encoding(UTF-8)",...) |
| 216 | or die "open failed: $!"; |
| 217 | |
| 218 | will construct a "binary" stream regardless of the platform defaults, |
| 219 | but then enable UTF-8 translation. |
| 220 | |
| 221 | =item :pop |
| 222 | |
| 223 | A pseudo-layer that removes the top-most layer. Gives Perl code a |
| 224 | way to manipulate the layer stack. Note that C<:pop> only works on |
| 225 | real layers and will not undo the effects of pseudo-layers or flags |
| 226 | like C<:utf8>. An example of a possible use might be: |
| 227 | |
| 228 | open(my $fh,...) or die "open failed: $!"; |
| 229 | ... |
| 230 | binmode($fh,":encoding(...)") or die "binmode failed: $!"; |
| 231 | # next chunk is encoded |
| 232 | ... |
| 233 | binmode($fh,":pop") or die "binmode failed: $!"; |
| 234 | # back to un-encoded |
| 235 | |
| 236 | A more elegant (and safer) interface is needed. |
| 237 | |
| 238 | =back |
| 239 | |
| 240 | =head2 Custom Layers |
| 241 | |
| 242 | It is possible to write custom layers in addition to the above builtin |
| 243 | ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>. |
| 244 | Some custom layers come with the Perl distribution. |
| 245 | |
| 246 | =over 4 |
| 247 | |
| 248 | =item :encoding |
| 249 | |
| 250 | Use C<:encoding(ENCODING)> to transparently do character set and encoding |
| 251 | transformations, for example from Shift-JIS to Unicode. Note that an |
| 252 | C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> for more |
| 253 | information. |
| 254 | |
| 255 | =item :mmap |
| 256 | |
| 257 | A layer which implements "reading" of files by using C<mmap()> to |
| 258 | make a (whole) file appear in the process's address space, and then |
| 259 | using that as PerlIO's "buffer". This I<may> be faster in certain |
| 260 | circumstances for large files, and may result in less physical memory |
| 261 | use when multiple processes are reading the same file. |
| 262 | |
| 263 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> |
| 264 | layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write |
| 265 | needs extra house-keeping (to extend the file) which negates any advantage. |
| 266 | |
| 267 | The C<:mmap> layer will not exist if the platform does not support C<mmap()>. |
| 268 | See L<PerlIO::mmap> for more information. |
| 269 | |
| 270 | =item :via |
| 271 | |
| 272 | C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl |
| 273 | module, for example compression / decompression, encryption / decryption. |
| 274 | See L<PerlIO::via> for more information. |
| 275 | |
| 276 | =item :scalar |
| 277 | |
| 278 | A layer implementing "in memory" files using scalar variables, |
| 279 | automatically used in place of the platform defaults for IO when opening |
| 280 | such a handle. As such, the scalar is expected to act like a file, only |
| 281 | containing or storing bytes. See L<PerlIO::scalar> for more information. |
| 282 | |
| 283 | =back |
| 284 | |
| 285 | =head2 Alternatives to raw |
| 286 | |
| 287 | To get a binary stream an alternate method is to use: |
| 288 | |
| 289 | open(my $fh,"<","whatever") or die "open failed: $!"; |
| 290 | binmode($fh) or die "binmode failed: $!"; |
| 291 | |
| 292 | This has the advantage of being backward compatible with older versions |
| 293 | of Perl that did not use PerlIO or where C<:raw> was buggy (as it was |
| 294 | before Perl 5.14). |
| 295 | |
| 296 | To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) |
| 297 | in the open call: |
| 298 | |
| 299 | open(my $fh,"<:unix",$path) or die "open failed: $!"; |
| 300 | |
| 301 | =head2 Defaults and how to override them |
| 302 | |
| 303 | If the platform is MS-DOS like and normally does CRLF to "\n" |
| 304 | translation for text files then the default layers are: |
| 305 | |
| 306 | :unix:crlf |
| 307 | |
| 308 | Otherwise if C<Configure> found out how to do "fast" IO using the system's |
| 309 | stdio (not common on modern architectures), then the default layers are: |
| 310 | |
| 311 | :stdio |
| 312 | |
| 313 | Otherwise the default layers are |
| 314 | |
| 315 | :unix:perlio |
| 316 | |
| 317 | Note that the "default stack" depends on the operating system and on the |
| 318 | Perl version, and both the compile-time and runtime configurations of |
| 319 | Perl. The default can be overridden by setting the environment variable |
| 320 | PERLIO to a space or colon separated list of layers, however this cannot |
| 321 | be used to set layers that require loading modules like C<:encoding>. |
| 322 | |
| 323 | This can be used to see the effect of/bugs in the various layers e.g. |
| 324 | |
| 325 | cd .../perl/t |
| 326 | PERLIO=:stdio ./perl harness |
| 327 | PERLIO=:perlio ./perl harness |
| 328 | |
| 329 | For the various values of PERLIO see L<perlrun/PERLIO>. |
| 330 | |
| 331 | The following table summarizes the default layers on UNIX-like and |
| 332 | DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: |
| 333 | |
| 334 | PERLIO UNIX-like DOS-like |
| 335 | ------ --------- -------- |
| 336 | unset / "" :unix:perlio / :stdio [1] :unix:crlf |
| 337 | :stdio :stdio :stdio |
| 338 | :perlio :unix:perlio :unix:perlio |
| 339 | |
| 340 | # [1] ":stdio" if Configure found out how to do "fast stdio" (depends |
| 341 | # on the stdio implementation) and in Perl 5.8, else ":unix:perlio" |
| 342 | |
| 343 | =head2 Querying the layers of filehandles |
| 344 | |
| 345 | The following returns the B<names> of the PerlIO layers on a filehandle. |
| 346 | |
| 347 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". |
| 348 | |
| 349 | The layers are returned in the order an open() or binmode() call would |
| 350 | use them, and without colons. |
| 351 | |
| 352 | By default the layers from the input side of the filehandle are |
| 353 | returned; to get the output side, use the optional C<output> argument: |
| 354 | |
| 355 | my @layers = PerlIO::get_layers($fh, output => 1); |
| 356 | |
| 357 | (Usually the layers are identical on either side of a filehandle but |
| 358 | for example with sockets there may be differences.) |
| 359 | |
| 360 | There is no set_layers(), nor does get_layers() return a tied array |
| 361 | mirroring the stack, or anything fancy like that. This is not |
| 362 | accidental or unintentional. The PerlIO layer stack is a bit more |
| 363 | complicated than just a stack (see for example the behaviour of C<:raw>). |
| 364 | You are supposed to use open() and binmode() to manipulate the stack. |
| 365 | |
| 366 | B<Implementation details follow, please close your eyes.> |
| 367 | |
| 368 | The arguments to layers are by default returned in parentheses after |
| 369 | the name of the layer, and certain layers (like C<:utf8>) are not real |
| 370 | layers but instead flags on real layers; to get all of these returned |
| 371 | separately, use the optional C<details> argument: |
| 372 | |
| 373 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); |
| 374 | |
| 375 | The result will be up to be three times the number of layers: |
| 376 | the first element will be a name, the second element the arguments |
| 377 | (unspecified arguments will be C<undef>), the third element the flags, |
| 378 | the fourth element a name again, and so forth. |
| 379 | |
| 380 | B<You may open your eyes now.> |
| 381 | |
| 382 | =head1 AUTHOR |
| 383 | |
| 384 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> |
| 385 | |
| 386 | =head1 SEE ALSO |
| 387 | |
| 388 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, |
| 389 | L<Encode> |
| 390 | |
| 391 | =cut |