This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
[perl #97476] Deparse not() following the llafr
[perl5.git] / lib / bytes.pm
... / ...
CommitLineData
1package bytes;
2
3our $VERSION = '1.04';
4
5$bytes::hint_bits = 0x00000008;
6
7sub import {
8 $^H |= $bytes::hint_bits;
9}
10
11sub unimport {
12 $^H &= ~$bytes::hint_bits;
13}
14
15sub AUTOLOAD {
16 require "bytes_heavy.pl";
17 goto &$AUTOLOAD if defined &$AUTOLOAD;
18 require Carp;
19 Carp::croak("Undefined subroutine $AUTOLOAD called");
20}
21
22sub length (_);
23sub chr (_);
24sub ord (_);
25sub substr ($$;$$);
26sub index ($$;$);
27sub rindex ($$;$);
28
291;
30__END__
31
32=head1 NAME
33
34bytes - Perl pragma to force byte semantics rather than character semantics
35
36=head1 NOTICE
37
38This pragma reflects early attempts to incorporate Unicode into perl and
39has since been superseded. It breaks encapsulation (i.e. it exposes the
40innards of how the perl executable currently happens to store a string),
41and use of this module for anything other than debugging purposes is
42strongly discouraged. If you feel that the functions here within might be
43useful for your application, this possibly indicates a mismatch between
44your mental model of Perl Unicode and the current reality. In that case,
45you may wish to read some of the perl Unicode documentation:
46L<perluniintro>, L<perlunitut>, L<perlunifaq> and L<perlunicode>.
47
48=head1 SYNOPSIS
49
50 use bytes;
51 ... chr(...); # or bytes::chr
52 ... index(...); # or bytes::index
53 ... length(...); # or bytes::length
54 ... ord(...); # or bytes::ord
55 ... rindex(...); # or bytes::rindex
56 ... substr(...); # or bytes::substr
57 no bytes;
58
59
60=head1 DESCRIPTION
61
62The C<use bytes> pragma disables character semantics for the rest of the
63lexical scope in which it appears. C<no bytes> can be used to reverse
64the effect of C<use bytes> within the current lexical scope.
65
66Perl normally assumes character semantics in the presence of character
67data (i.e. data that has come from a source that has been marked as
68being of a particular character encoding). When C<use bytes> is in
69effect, the encoding is temporarily ignored, and each string is treated
70as a series of bytes.
71
72As an example, when Perl sees C<$x = chr(400)>, it encodes the character
73in UTF-8 and stores it in $x. Then it is marked as character data, so,
74for instance, C<length $x> returns C<1>. However, in the scope of the
75C<bytes> pragma, $x is treated as a series of bytes - the bytes that make
76up the UTF8 encoding - and C<length $x> returns C<2>:
77
78 $x = chr(400);
79 print "Length is ", length $x, "\n"; # "Length is 1"
80 printf "Contents are %vd\n", $x; # "Contents are 400"
81 {
82 use bytes; # or "require bytes; bytes::length()"
83 print "Length is ", length $x, "\n"; # "Length is 2"
84 printf "Contents are %vd\n", $x; # "Contents are 198.144"
85 }
86
87chr(), ord(), substr(), index() and rindex() behave similarly.
88
89For more on the implications and differences between character
90semantics and byte semantics, see L<perluniintro> and L<perlunicode>.
91
92=head1 LIMITATIONS
93
94bytes::substr() does not work as an lvalue().
95
96=head1 SEE ALSO
97
98L<perluniintro>, L<perlunicode>, L<utf8>
99
100=cut