This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Extend the effect of the encoding pragma to chr() and ord().
[perl5.git] / lib / utf8.pm
CommitLineData
a0ed51b3
LW
1package utf8;
2
d5448623
GS
3$utf8::hint_bits = 0x00800000;
4
b75c8c73
MS
5our $VERSION = '1.00';
6
a0ed51b3 7sub import {
d5448623 8 $^H |= $utf8::hint_bits;
a0ed51b3
LW
9 $enc{caller()} = $_[1] if $_[1];
10}
11
12sub unimport {
d5448623 13 $^H &= ~$utf8::hint_bits;
a0ed51b3
LW
14}
15
16sub AUTOLOAD {
17 require "utf8_heavy.pl";
daf4d4ea
SC
18 goto &$AUTOLOAD if defined &$AUTOLOAD;
19 Carp::croak("Undefined subroutine $AUTOLOAD called");
a0ed51b3
LW
20}
21
221;
23__END__
24
25=head1 NAME
26
b3419ed8 27utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
a0ed51b3
LW
28
29=head1 SYNOPSIS
30
31 use utf8;
32 no utf8;
33
34=head1 DESCRIPTION
35
393fec97 36The C<use utf8> pragma tells the Perl parser to allow UTF-8 in the
b3419ed8
PK
37program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based
38platforms). The C<no utf8> pragma tells Perl to switch back to treating
39the source text as literal bytes in the current lexical scope.
a0ed51b3 40
393fec97
GS
41This pragma is primarily a compatibility device. Perl versions
42earlier than 5.6 allowed arbitrary bytes in source code, whereas
43in future we would like to standardize on the UTF-8 encoding for
44source text. Until UTF-8 becomes the default format for source
45text, this pragma should be used to recognize UTF-8 in the source.
46When UTF-8 becomes the standard source format, this pragma will
b3419ed8 47effectively become a no-op. For convenience in what follows the
ad0029c4 48term I<UTF-X> is used to refer to UTF-8 on ASCII and ISO Latin based
b3419ed8 49platforms and UTF-EBCDIC on EBCDIC based platforms.
a0ed51b3 50
ad0029c4 51Enabling the C<utf8> pragma has the following effect:
a0ed51b3 52
4ac9195f 53=over 4
a0ed51b3
LW
54
55=item *
56
393fec97 57Bytes in the source text that have their high-bit set will be treated
ad0029c4
JH
58as being part of a literal UTF-8 character. This includes most
59literals such as identifiers, string constants, constant regular
60expression patterns and package names. On EBCDIC platforms characters
61in the Latin 1 character set are treated as being part of a literal
62UTF-EBCDIC character.
a0ed51b3 63
4ac9195f
MS
64=back
65
1b026014
NIS
66=head2 Utility functions
67
68The following functions are defined in the C<utf8::> package by the perl core.
69
70=over 4
71
72=item * $num_octets = utf8::upgrade($string);
73
ad0029c4
JH
74Converts internal representation of string to the Perl's internal
75I<UTF-X> form. Returns the number of octets necessary to represent
76the string as I<UTF-X>.
1b026014
NIS
77
78=item * utf8::downgrade($string[, CHECK])
79
80Converts internal representation of string to be un-encoded bytes.
81
82=item * utf8::encode($string)
83
84Converts (in-place) I<$string> from logical characters to octet sequence
ad0029c4 85representing it in Perl's I<UTF-X> encoding.
1b026014
NIS
86
87=item * $flag = utf8::decode($string)
88
ad0029c4
JH
89Attempts to convert I<$string> in-place from Perl's I<UTF-X> encoding
90into logical characters.
1b026014
NIS
91
92=back
93
393fec97 94=head1 SEE ALSO
a0ed51b3 95
8058d7ab 96L<perlunicode>, L<bytes>
a0ed51b3
LW
97
98=cut