X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/7d3b96bbd83d17f17b26b4c05ef623881c8477be..0c784221798e121dc35092869b95bd53853ec058:/lib/PerlIO.pm diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm index 04cd4cf..f4a0197 100644 --- a/lib/PerlIO.pm +++ b/lib/PerlIO.pm @@ -1,7 +1,9 @@ package PerlIO; +our $VERSION = '1.06'; + # Map layer name to package that defines it -my %alias = (encoding => 'Encode'); +our %alias; sub import { @@ -22,6 +24,8 @@ sub import } } +sub F_UTF8 () { 0x8000 } + 1; __END__ @@ -31,16 +35,18 @@ PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space =head1 SYNOPSIS - open($fh,">:crlf","my.txt") - open($fh,">:raw","his.jpg") + open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files + + open($fh,"<","his.jpg"); # portably open a binary file for reading + binmode($fh); Shell: PERLIO=perlio perl .... =head1 DESCRIPTION -When an undefined layer 'foo' is encountered in an C or C layer -specification then C code performs the equivalent of: +When an undefined layer 'foo' is encountered in an C or +C layer specification then C code performs the equivalent of: use PerlIO 'foo'; @@ -48,73 +54,208 @@ The perl code in PerlIO.pm then attempts to locate a layer by doing require PerlIO::foo; -Otherwise the C package is a place holder for additional PerLIO related -functions. +Otherwise the C package is a place holder for additional +PerlIO related functions. The following layers are currently defined: =over 4 -=item unix +=item :unix + +Lowest level layer which provides basic PerlIO operations in terms of +UNIX/POSIX numeric file descriptor calls +(open(), read(), write(), lseek(), close()). + +=item :stdio + +Layer which calls C, C and C/C etc. Note +that as this is "real" stdio it will ignore any layers beneath it and +go straight to the operating system via the C library as usual. + +=item :perlio + +A from scratch implementation of buffering for PerlIO. Provides fast +access to the buffer for C which implements perl's readline/EE +and in general attempts to minimize data copying. + +C<:perlio> will insert a C<:unix> layer below itself to do low level IO. + +=item :crlf + +A layer that implements DOS/Windows like CRLF line endings. On read +converts pairs of CR,LF to a single "\n" newline character. On write +converts each "\n" to a CR,LF pair. Note that this layer likes to be +one of its kind: it silently ignores attempts to be pushed into the +layer stack more than once. + +It currently does I mimic MS-DOS as far as treating of Control-Z +as being an end-of-file marker. + +(Gory details follow) To be more exact what happens is this: after +pushing itself to the stack, the C<:crlf> layer checks all the layers +below itself to find the first layer that is capable of being a CRLF +layer but is not yet enabled to be a CRLF layer. If it finds such a +layer, it enables the CRLFness of that other deeper layer, and then +pops itself off the stack. If not, fine, use the one we just pushed. + +The end result is that a C<:crlf> means "please enable the first CRLF +layer you can find, and if you can't find one, here would be a good +spot to place a new one." + +Based on the C<:perlio> layer. + +=item :mmap + +A layer which implements "reading" of files by using C to +make a (whole) file appear in the process's address space, and then +using that as PerlIO's "buffer". This I be faster in certain +circumstances for large files, and may result in less physical memory +use when multiple processes are reading the same file. + +Files which are not C-able revert to behaving like the C<:perlio> +layer. Writes also behave like the C<:perlio> layer, as C for write +needs extra house-keeping (to extend the file) which negates any advantage. + +The C<:mmap> layer will not exist if the platform does not support C. + +=item :utf8 + +Declares that the stream accepts perl's I encoding of +characters. (Which really is UTF-8 on ASCII machines, but is +UTF-EBCDIC on EBCDIC machines.) This allows any character perl can +represent to be read from or written to the stream. The UTF-X encoding +is chosen to render simple text parts (i.e. non-accented letters, +digits and common punctuation) human readable in the encoded file. + +Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) +and then read it back in. + + open(F, ">:utf8", "data.utf"); + print F $out; + close(F); + + open(F, "<:utf8", "data.utf"); + $in = ; + close(F); + +Note that this layer does not validate byte sequences. For reading +input, using C<:encoding(utf8)> instead of bare C<:utf8> is strongly +recommended. + +=item :bytes + +This is the inverse of the C<:utf8> layer. It turns off the flag +on the layer below so that data read from it is considered to +be "octets" i.e. characters in the range 0..255 only. Likewise +on output perl will warn if a "wide" character is written +to a such a stream. + +=item :raw + +The C<:raw> layer is I as being identical to calling +C - the stream is made suitable for passing binary data, +i.e. each byte is passed as-is. The stream will still be +buffered. + +In Perl 5.6 and some books the C<:raw> layer (previously sometimes also +referred to as a "discipline") is documented as the inverse of the +C<:crlf> layer. That is no longer the case - other layers which would +alter the binary nature of the stream are also disabled. If you want UNIX +line endings on a platform that normally does CRLF translation, but still +want UTF-8 or encoding defaults, the appropriate thing to do is to add +C<:perlio> to the PERLIO environment variable. -Low level layer which calls C, C and C etc. +The implementation of C<:raw> is as a pseudo-layer which when "pushed" +pops itself and then any layers which do not declare themselves as suitable +for binary data. (Undoing :utf8 and :crlf are implemented by clearing +flags rather than popping layers but that is an implementation detail.) -=item stdio +As a consequence of the fact that C<:raw> normally pops layers, +it usually only makes sense to have it as the only or first element in +a layer specification. When used as the first element it provides +a known base on which to build e.g. -Layer which calls C, C and C/C etc. -Note that as this is "real" stdio it will ignore any layers beneath it and -got straight to the operating system via the C library as usual. + open($fh,":raw:utf8",...) -=item perlio +will construct a "binary" stream, but then enable UTF-8 translation. -This is a re-implementation of "stdio-like" buffering written as a PerlIO "layer". -As such it will call whatever layer is below it for its operations. +=item :pop -=item crlf +A pseudo layer that removes the top-most layer. Gives perl code +a way to manipulate the layer stack. Should be considered +as experimental. Note that C<:pop> only works on real layers +and will not undo the effects of pseudo layers like C<:utf8>. +An example of a possible use might be: -A layer which does CRLF to "\n" translation distinguishing "text" and "binary" -files in the manner of MS-DOS and similar operating systems. + open($fh,...) + ... + binmode($fh,":encoding(...)"); # next chunk is encoded + ... + binmode($fh,":pop"); # back to un-encoded -=item utf8 +A more elegant (and safer) interface is needed. -Declares that the stream accepts perl's internal encoding of characters. -(Which really is UTF-8 on ASCII machines, but is UTF-EBCDIC on EBCDIC machines.) -This allows any character perl can represent to be read from or written to the -stream. The UTF-X encoding is chosen to render simple text parts (i.e. -non-accented letters, digits and common punctuation) human readable in the -encoded file. +=item :win32 -=item raw +On Win32 platforms this I layer uses the native "handle" IO +rather than the unix-like numeric file descriptor layer. Known to be +buggy as of perl 5.8.2. -A pseudo-layer which performs two functions (which is messy, but necessary to -maintain compatibility with non-PerLIO builds of perl and they way things -have been documented elsewhere). +=back + +=head2 Custom Layers + +It is possible to write custom layers in addition to the above builtin +ones, both in C/XS and Perl. Two such layers (and one example written +in Perl using the latter) come with the Perl distribution. + +=over 4 -Firstly it forces the file handle to be considered binary at that point -in the layer stack, +=item :encoding -Secondly in prevents the IO system seaching back before it in the layer specification. -Thus: +Use C<:encoding(ENCODING)> either in open() or binmode() to install +a layer that transparently does character set and encoding transformations, +for example from Shift-JIS to Unicode. Note that under C +an C<:encoding> also enables C<:utf8>. See L +for more information. - open($fh,":raw:perlio"),...) +=item :via -Forces the use of C layer even if the platform default, or C default -is something else (such as ":encoding(iso-8859-7)" ) which would interfere with -binary nature of the stream. +Use C<:via(MODULE)> either in open() or binmode() to install a layer +that does whatever transformation (for example compression / +decompression, encryption / decryption) to the filehandle. +See L for more information. =back +=head2 Alternatives to raw + +To get a binary stream an alternate method is to use: + + open($fh,"whatever") + binmode($fh); + +this has the advantage of being backward compatible with how such things have +had to be coded on some platforms for years. + +To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) +in the open call: + + open($fh,"<:unix",$path) + =head2 Defaults and how to override them -If the platform is MS-DOS like and normally does CRLF to "\n" translation -for text files then the default layers are : +If the platform is MS-DOS like and normally does CRLF to "\n" +translation for text files then the default layers are : unix crlf -(The low level "unix" layer may be replaced by a platform specific low level layer.) +(The low level "unix" layer may be replaced by a platform specific low +level layer.) -Otherwise if C found out how to do "fast" IO using system's stdio, then -the default layers are : +Otherwise if C found out how to do "fast" IO using the system's +stdio, then the default layers are: unix stdio @@ -124,22 +265,80 @@ Otherwise the default layers are These defaults may change once perlio has been better tested and tuned. -The default can be overridden by setting the environment variable PERLIO -to a space separated list of layers (unix or platform low level layer is -always pushed first). +The default can be overridden by setting the environment variable +PERLIO to a space separated list of layers (C or platform low +level layer is always pushed first). + This can be used to see the effect of/bugs in the various layers e.g. cd .../perl/t PERLIO=stdio ./perl harness PERLIO=perlio ./perl harness +For the various values of PERLIO see L. + +=head2 Querying the layers of filehandles + +The following returns the B of the PerlIO layers on a filehandle. + + my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". + +The layers are returned in the order an open() or binmode() call would +use them. Note that the "default stack" depends on the operating +system and on the Perl version, and both the compile-time and +runtime configurations of Perl. + +The following table summarizes the default layers on UNIX-like and +DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: + + PERLIO UNIX-like DOS-like + ------ --------- -------- + unset / "" unix perlio / stdio [1] unix crlf + stdio unix perlio / stdio [1] stdio + perlio unix perlio unix perlio + mmap unix mmap unix mmap + + # [1] "stdio" if Configure found out how to do "fast stdio" (depends + # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" + +By default the layers from the input side of the filehandle are +returned; to get the output side, use the optional C argument: + + my @layers = PerlIO::get_layers($fh, output => 1); + +(Usually the layers are identical on either side of a filehandle but +for example with sockets there may be differences, or if you have +been using the C pragma.) + +There is no set_layers(), nor does get_layers() return a tied array +mirroring the stack, or anything fancy like that. This is not +accidental or unintentional. The PerlIO layer stack is a bit more +complicated than just a stack (see for example the behaviour of C<:raw>). +You are supposed to use open() and binmode() to manipulate the stack. + +B + +The arguments to layers are by default returned in parentheses after +the name of the layer, and certain layers (like C) are not real +layers but instead flags on real layers; to get all of these returned +separately, use the optional C
argument: + + my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); + +The result will be up to be three times the number of layers: +the first element will be a name, the second element the arguments +(unspecified arguments will be C), the third element the flags, +the fourth element a name again, and so forth. + +B + =head1 AUTHOR Nick Ing-Simmons Enick@ing-simmons.netE =head1 SEE ALSO -L, L, L, L +L, L, L, L, +L =cut -