That's what I<you> think!
- What's C<dump()> for?
+ What's C<CORE::dump()> for?
X<C<chmod> and C<unlink()> Under Different Operating Systems>
Discussed briefly in L<perlpod/"Formatting Codes">.
-This code is unusual is that it should have no content. That is,
+This code is unusual in that it should have no content. That is,
a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether
or not it complains, the I<potatoes> text should ignored.
file begins with the two literal byte values 0xFE 0xFF, this is
the BOM for big-endian UTF-16. If the file begins with the two
literal byte value 0xFF 0xFE, this is the BOM for little-endian
-UTF-16. If the file begins with the three literal byte values
+UTF-16. On an ASCII platform, if the file begins with the three literal
+byte values
0xEF 0xBB 0xBF, this is the BOM for UTF-8.
+A mechanism portable to EBCDIC platforms is to:
+
+ my $utf8_bom = "\x{FEFF}";
+ utf8::encode($utf8_bom);
=for comment
use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}";
=item *
-A naive but often sufficient heuristic for testing the first highbit
+A naive, but often sufficient heuristic on ASCII platforms, for testing
+the first highbit
byte-sequence in a BOM-less file (whether in code or in Pod!), to see
whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
that the first byte in the sequence is in the range 0xC2 - 0xFD
0x80 - 0xBF. If so, the parser may conclude that this file is in
UTF-8, and all highbit sequences in the file should be assumed to
be UTF-8. Otherwise the parser should treat the file as being
-in CP-1252. (A better check is to pass a copy of the sequence to
+in CP-1252. (A better check, and which works on EBCDIC platforms as
+well, is to pass a copy of the sequence to
L<utf8::decode()|utf8> which performs a full validity check on the
sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This
function is always pre-loaded, is fast because it is written in C, and
=item *
-This document's requirements and suggestions about encodings
-do not apply to Pod processors running on non-ASCII platforms,
-notably EBCDIC platforms.
-
-=item *
-
Pod processors must treat a "=for [label] [content...]" paragraph as
meaning the same thing as a "=begin [label]" paragraph, content, and
an "=end [label]" paragraph. (The parser may conflate these two
Authors of Pod formatters/processors should make every effort to
avoid writing their own Pod parser. There are already several in
CPAN, with a wide range of interface styles -- and one of them,
-Pod::Parser, comes with modern versions of Perl.
+Pod::Simple, comes with modern versions of Perl.
=item *
When referring to characters by using a EE<lt>n> numeric code, numbers
in the range 32-126 refer to those well known US-ASCII characters (also
defined there by Unicode, with the same meaning), which all Pod
-formatters must render faithfully. Numbers in the ranges 0-31 and
-127-159 should not be used (neither as literals, nor as EE<lt>number>
-codes), except for the literal byte-sequences for newline (13, 13 10, or
-10), and tab (9).
+formatters must render faithfully. Characters whose EE<lt>E<gt> numbers
+are in the ranges 0-31 and 127-159 should not be used (neither as
+literals,
+nor as EE<lt>number> codes), except for the literal byte-sequences for
+newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9).
Numbers in the range 160-255 refer to Latin-1 characters (also
defined there by Unicode, with the same meaning). Numbers above
Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether
an htmlname, or a number in any base) must consist only of
-alphanumeric characters -- that is, I<whatever> must watch
+alphanumeric characters -- that is, I<whatever> must match
C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because
it contains spaces, which aren't alphanumeric characters. This
presumably does not I<need> special treatment by a Pod processor;
might be a real Perl module or program in an @INC / PATH
directory, or a .pod file in those places); or the name of a Unix
man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>>
-in ambiguous between a Pod page called "chmod", or the Unix man page
+is ambiguous between a Pod page called "chmod", or the Unix man page
"chmod" (in whatever man-section). However, the presence of a string
in parens, as in "crontab(5)", is sufficient to signal that what
is being discussed is not a Pod page, and so is presumably a