perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=encoding utf8
	2
	3	=head1 NAME
	4
	5	perlpodspec - Plain Old Documentation: format specification and notes
	6
	7	=head1 DESCRIPTION
	8
	9	This document is detailed notes on the Pod markup language. Most
	10	people will only have to read L<perlpod\|perlpod> to know how to write
	11	in Pod, but this document may answer some incidental questions to do
	12	with parsing and rendering Pod.
	13
	14	In this document, "must" / "must not", "should" /
	15	"should not", and "may" have their conventional (cf. RFC 2119)
	16	meanings: "X must do Y" means that if X doesn't do Y, it's against
	17	this specification, and should really be fixed. "X should do Y"
	18	means that it's recommended, but X may fail to do Y, if there's a
	19	good reason. "X may do Y" is merely a note that X can do Y at
	20	will (although it is up to the reader to detect any connotation of
	21	"and I think it would be I<nice> if X did Y" versus "it wouldn't
	22	really I<bother> me if X did Y").
	23
	24	Notably, when I say "the parser should do Y", the
	25	parser may fail to do Y, if the calling application explicitly
	26	requests that the parser I<not> do Y. I often phrase this as
	27	"the parser should, by default, do Y." This doesn't I<require>
	28	the parser to provide an option for turning off whatever
	29	feature Y is (like expanding tabs in verbatim paragraphs), although
	30	it implicates that such an option I<may> be provided.
	31
	32	=head1 Pod Definitions
	33
	34	Pod is embedded in files, typically Perl source files, although you
	35	can write a file that's nothing but Pod.
	36
	37	A B<line> in a file consists of zero or more non-newline characters,
	38	terminated by either a newline or the end of the file.
	39
	40	A B<newline sequence> is usually a platform-dependent concept, but
	41	Pod parsers should understand it to mean any of CR (ASCII 13), LF
	42	(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
	43	addition to any other system-specific meaning. The first CR/CRLF/LF
	44	sequence in the file may be used as the basis for identifying the
	45	newline sequence for parsing the rest of the file.
	46
	47	A B<blank line> is a line consisting entirely of zero or more spaces
	48	(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file.
	49	A B<non-blank line> is a line containing one or more characters other
	50	than space or tab (and terminated by a newline or end-of-file).
	51
	52	(I<Note:> Many older Pod parsers did not accept a line consisting of
	53	spaces/tabs and then a newline as a blank line. The only lines they
	54	considered blank were lines consisting of I<no characters at all>,
	55	terminated by a newline.)
	56
	57	B<Whitespace> is used in this document as a blanket term for spaces,
	58	tabs, and newline sequences. (By itself, this term usually refers
	59	to literal whitespace. That is, sequences of whitespace characters
	60	in Pod source, as opposed to "EE<lt>32>", which is a formatting
	61	code that I<denotes> a whitespace character.)
	62
	63	A B<Pod parser> is a module meant for parsing Pod (regardless of
	64	whether this involves calling callbacks or building a parse tree or
	65	directly formatting it). A B<Pod formatter> (or B<Pod translator>)
	66	is a module or program that converts Pod to some other format (HTML,
	67	plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a
	68	formatter or translator, or might be a program that does something
	69	else with the Pod (like counting words, scanning for index points,
	70	etc.).
	71
	72	Pod content is contained in B<Pod blocks>. A Pod block starts with a
	73	line that matches C<m/\A=[a-zA-Z]/>, and continues up to the next line
	74	that matches C<m/\A=cut/> or up to the end of the file if there is
	75	no C<m/\A=cut/> line.
	76
	77	=for comment
	78	The current perlsyn says:
	79	[beginquote]
	80	Note that pod translators should look at only paragraphs beginning
	81	with a pod directive (it makes parsing easier), whereas the compiler
	82	actually knows to look for pod escapes even in the middle of a
	83	paragraph. This means that the following secret stuff will be ignored
	84	by both the compiler and the translators.
	85	$a=3;
	86	=secret stuff
	87	warn "Neither POD nor CODE!?"
	88	=cut back
	89	print "got $a\n";
	90	You probably shouldn't rely upon the warn() being podded out forever.
	91	Not all pod translators are well-behaved in this regard, and perhaps
	92	the compiler will become pickier.
	93	[endquote]
	94	I think that those paragraphs should just be removed; paragraph-based
	95	parsing seems to have been largely abandoned, because of the hassle
	96	with non-empty blank lines messing up what people meant by "paragraph".
	97	Even if the "it makes parsing easier" bit were especially true,
	98	it wouldn't be worth the confusion of having perl and pod2whatever
	99	actually disagree on what can constitute a Pod block.
	100
	101	Note that a parser is not expected to distinguish between something that
	102	looks like pod, but is in a quoted string, such as a here document.
	103
	104	Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph
	105	consists of non-blank lines of text, separated by one or more blank
	106	lines.
	107
	108	For purposes of Pod processing, there are four types of paragraphs in
	109	a Pod block:
	110
	111	=over
	112
	113	=item *
	114
	115	A command paragraph (also called a "directive"). The first line of
	116	this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are
	117	typically one line, as in:
	118
	119	=head1 NOTES
	120
	121	=item *
	122
	123	But they may span several (non-blank) lines:
	124
	125	=for comment
	126	Hm, I wonder what it would look like if
	127	you tried to write a BNF for Pod from this.
	128
	129	=head3 Dr. Strangelove, or: How I Learned to
	130	Stop Worrying and Love the Bomb
	131
	132	I<Some> command paragraphs allow formatting codes in their content
	133	(i.e., after the part that matches C<m/\A=[a-zA-Z]\S\s/>), as in:
	134
	135	=head1 Did You Remember to C<use strict;>?
	136
	137	In other words, the Pod processing handler for "head1" will apply the
	138	same processing to "Did You Remember to CE<lt>use strict;>?" that it
	139	would to an ordinary paragraph (i.e., formatting codes like
	140	"CE<lt>...>") are parsed and presumably formatted appropriately, and
	141	whitespace in the form of literal spaces and/or tabs is not
	142	significant.
	143
	144	=item *
	145
	146	A B<verbatim paragraph>. The first line of this paragraph must be a
	147	literal space or tab, and this paragraph must not be inside a "=begin
	148	I<identifier>", ... "=end I<identifier>" sequence unless
	149	"I<identifier>" begins with a colon (":"). That is, if a paragraph
	150	starts with a literal space or tab, but I<is> inside a
	151	"=begin I<identifier>", ... "=end I<identifier>" region, then it's
	152	a data paragraph, unless "I<identifier>" begins with a colon.
	153
	154	Whitespace I<is> significant in verbatim paragraphs (although, in
	155	processing, tabs are probably expanded).
	156
	157	=item *
	158
	159	An B<ordinary paragraph>. A paragraph is an ordinary paragraph
	160	if its first line matches neither C<m/\A=[a-zA-Z]/> nor
	161	C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>",
	162	... "=end I<identifier>" sequence unless "I<identifier>" begins with
	163	a colon (":").
	164
	165	=item *
	166
	167	A B<data paragraph>. This is a paragraph that I<is> inside a "=begin
	168	I<identifier>" ... "=end I<identifier>" sequence where
	169	"I<identifier>" does I<not> begin with a literal colon (":"). In
	170	some sense, a data paragraph is not part of Pod at all (i.e.,
	171	effectively it's "out-of-band"), since it's not subject to most kinds
	172	of Pod parsing; but it is specified here, since Pod
	173	parsers need to be able to call an event for it, or store it in some
	174	form in a parse tree, or at least just parse I<around> it.
	175
	176	=back
	177
	178	For example: consider the following paragraphs:
	179
	180	# <- that's the 0th column
	181
	182	=head1 Foo
	183
	184	Stuff
	185
	186	$foo->bar
	187
	188	=cut
	189
	190	Here, "=head1 Foo" and "=cut" are command paragraphs because the first
	191	line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar"
	192	is a verbatim paragraph, because its first line starts with a literal
	193	whitespace character (and there's no "=begin"..."=end" region around).
	194
	195	The "=begin I<identifier>" ... "=end I<identifier>" commands stop
	196	paragraphs that they surround from being parsed as ordinary or verbatim
	197	paragraphs, if I<identifier> doesn't begin with a colon. This
	198	is discussed in detail in the section
	199	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	200
	201	=head1 Pod Commands
	202
	203	This section is intended to supplement and clarify the discussion in
	204	L<perlpod/"Command Paragraph">. These are the currently recognized
	205	Pod commands:
	206
	207	=over
	208
	209	=item "=head1", "=head2", "=head3", "=head4"
	210
	211	This command indicates that the text in the remainder of the paragraph
	212	is a heading. That text may contain formatting codes. Examples:
	213
	214	=head1 Object Attributes
	215
	216	=head3 What B<Not> to Do!
	217
	218	=item "=pod"
	219
	220	This command indicates that this paragraph begins a Pod block. (If we
	221	are already in the middle of a Pod block, this command has no effect at
	222	all.) If there is any text in this command paragraph after "=pod",
	223	it must be ignored. Examples:
	224
	225	=pod
	226
	227	This is a plain Pod paragraph.
	228
	229	=pod This text is ignored.
	230
	231	=item "=cut"
	232
	233	This command indicates that this line is the end of this previously
	234	started Pod block. If there is any text after "=cut" on the line, it must be
	235	ignored. Examples:
	236
	237	=cut
	238
	239	=cut The documentation ends here.
	240
	241	=cut
	242	# This is the first line of program text.
	243	sub foo { # This is the second.
	244
	245	It is an error to try to I<start> a Pod block with a "=cut" command. In
	246	that case, the Pod processor must halt parsing of the input file, and
	247	must by default emit a warning.
	248
	249	=item "=over"
	250
	251	This command indicates that this is the start of a list/indent
	252	region. If there is any text following the "=over", it must consist
	253	of only a nonzero positive numeral. The semantics of this numeral is
	254	explained in the L</"About =over...=back Regions"> section, further
	255	below. Formatting codes are not expanded. Examples:
	256
	257	=over 3
	258
	259	=over 3.5
	260
	261	=over
	262
	263	=item "=item"
	264
	265	This command indicates that an item in a list begins here. Formatting
	266	codes are processed. The semantics of the (optional) text in the
	267	remainder of this paragraph are
	268	explained in the L</"About =over...=back Regions"> section, further
	269	below. Examples:
	270
	271	=item
	272
	273	=item *
	274
	275	=item *
	276
	277	=item 14
	278
	279	=item 3.
	280
	281	=item C<< $thing->stuff(I<dodad>) >>
	282
	283	=item For transporting us beyond seas to be tried for pretended
	284	offenses
	285
	286	=item He is at this time transporting large armies of foreign
	287	mercenaries to complete the works of death, desolation and
	288	tyranny, already begun with circumstances of cruelty and perfidy
	289	scarcely paralleled in the most barbarous ages, and totally
	290	unworthy the head of a civilized nation.
	291
	292	=item "=back"
	293
	294	This command indicates that this is the end of the region begun
	295	by the most recent "=over" command. It permits no text after the
	296	"=back" command.
	297
	298	=item "=begin formatname"
	299
	300	=item "=begin formatname parameter"
	301
	302	This marks the following paragraphs (until the matching "=end
	303	formatname") as being for some special kind of processing. Unless
	304	"formatname" begins with a colon, the contained non-command
	305	paragraphs are data paragraphs. But if "formatname" I<does> begin
	306	with a colon, then non-command paragraphs are ordinary paragraphs
	307	or data paragraphs. This is discussed in detail in the section
	308	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	309
	310	It is advised that formatnames match the regexp
	311	C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the
	312	formatname is a parameter that may be used by the formatter when dealing
	313	with this region. This parameter must not be repeated in the "=end"
	314	paragraph. Implementors should anticipate future expansion in the
	315	semantics and syntax of the first parameter to "=begin"/"=end"/"=for".
	316
	317	=item "=end formatname"
	318
	319	This marks the end of the region opened by the matching
	320	"=begin formatname" region. If "formatname" is not the formatname
	321	of the most recent open "=begin formatname" region, then this
	322	is an error, and must generate an error message. This
	323	is discussed in detail in the section
	324	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	325
	326	=item "=for formatname text..."
	327
	328	This is synonymous with:
	329
	330	=begin formatname
	331
	332	text...
	333
	334	=end formatname
	335
	336	That is, it creates a region consisting of a single paragraph; that
	337	paragraph is to be treated as a normal paragraph if "formatname"
	338	begins with a ":"; if "formatname" I<doesn't> begin with a colon,
	339	then "text..." will constitute a data paragraph. There is no way
	340	to use "=for formatname text..." to express "text..." as a verbatim
	341	paragraph.
	342
	343	=item "=encoding encodingname"
	344
	345	This command, which should occur early in the document (at least
	346	before any non-US-ASCII data!), declares that this document is
	347	encoded in the encoding I<encodingname>, which must be
	348	an encoding name that L<Encode> recognizes. (Encode's list
	349	of supported encodings, in L<Encode::Supported>, is useful here.)
	350	If the Pod parser cannot decode the declared encoding, it
	351	should emit a warning and may abort parsing the document
	352	altogether.
	353
	354	A document having more than one "=encoding" line should be
	355	considered an error. Pod processors may silently tolerate this if
	356	the not-first "=encoding" lines are just duplicates of the
	357	first one (e.g., if there's a "=encoding utf8" line, and later on
	358	another "=encoding utf8" line). But Pod processors should complain if
	359	there are contradictory "=encoding" lines in the same document
	360	(e.g., if there is a "=encoding utf8" early in the document and
	361	"=encoding big5" later). Pod processors that recognize BOMs
	362	may also complain if they see an "=encoding" line
	363	that contradicts the BOM (e.g., if a document with a UTF-16LE
	364	BOM has an "=encoding shiftjis" line).
	365
	366	=back
	367
	368	If a Pod processor sees any command other than the ones listed
	369	above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish",
	370	or "=w123"), that processor must by default treat this as an
	371	error. It must not process the paragraph beginning with that
	372	command, must by default warn of this as an error, and may
	373	abort the parse. A Pod parser may allow a way for particular
	374	applications to add to the above list of known commands, and to
	375	stipulate, for each additional command, whether formatting
	376	codes should be processed.
	377
	378	Future versions of this specification may add additional
	379	commands.
	380
	381
	382
	383	=head1 Pod Formatting Codes
	384
	385	(Note that in previous drafts of this document and of perlpod,
	386	formatting codes were referred to as "interior sequences", and
	387	this term may still be found in the documentation for Pod parsers,
	388	and in error messages from Pod processors.)
	389
	390	There are two syntaxes for formatting codes:
	391
	392	=over
	393
	394	=item *
	395
	396	A formatting code starts with a capital letter (just US-ASCII [A-Z])
	397	followed by a "<", any number of characters, and ending with the first
	398	matching ">". Examples:
	399
	400	That's what I<you> think!
	401
	402	What's C<CORE::dump()> for?
	403
	404	X<C<chmod> and C<unlink()> Under Different Operating Systems>
	405
	406	=item *
	407
	408	A formatting code starts with a capital letter (just US-ASCII [A-Z])
	409	followed by two or more "<"'s, one or more whitespace characters,
	410	any number of characters, one or more whitespace characters,
	411	and ending with the first matching sequence of two or more ">"'s, where
	412	the number of ">"'s equals the number of "<"'s in the opening of this
	413	formatting code. Examples:
	414
	415	That's what I<< you >> think!
	416
	417	C<<< open(X, ">>thing.dat") \|\| die $! >>>
	418
	419	B<< $foo->bar(); >>
	420
	421	With this syntax, the whitespace character(s) after the "CE<lt><<"
	422	and before the ">>>" (or whatever letter) are I<not> renderable. They
	423	do not signify whitespace, are merely part of the formatting codes
	424	themselves. That is, these are all synonymous:
	425
	426	C<thing>
	427	C<< thing >>
	428	C<< thing >>
	429	C<<< thing >>>
	430	C<<<<
	431	thing
	432	>>>>
	433
	434	and so on.
	435
	436	Finally, the multiple-angle-bracket form does I<not> alter the interpretation
	437	of nested formatting codes, meaning that the following four example lines are
	438	identical in meaning:
	439
	440	B<example: C<$a E<lt>=E<gt> $b>>
	441
	442	B<example: C<< $a <=> $b >>>
	443
	444	B<example: C<< $a E<lt>=E<gt> $b >>>
	445
	446	B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
	447
	448	=back
	449
	450	In parsing Pod, a notably tricky part is the correct parsing of
	451	(potentially nested!) formatting codes. Implementors should
	452	consult the code in the C<parse_text> routine in Pod::Parser as an
	453	example of a correct implementation.
	454
	455	=over
	456
	457	=item C<IE<lt>textE<gt>> -- italic text
	458
	459	See the brief discussion in L<perlpod/"Formatting Codes">.
	460
	461	=item C<BE<lt>textE<gt>> -- bold text
	462
	463	See the brief discussion in L<perlpod/"Formatting Codes">.
	464
	465	=item C<CE<lt>codeE<gt>> -- code text
	466
	467	See the brief discussion in L<perlpod/"Formatting Codes">.
	468
	469	=item C<FE<lt>filenameE<gt>> -- style for filenames
	470
	471	See the brief discussion in L<perlpod/"Formatting Codes">.
	472
	473	=item C<XE<lt>topic nameE<gt>> -- an index entry
	474
	475	See the brief discussion in L<perlpod/"Formatting Codes">.
	476
	477	This code is unusual in that most formatters completely discard
	478	this code and its content. Other formatters will render it with
	479	invisible codes that can be used in building an index of
	480	the current document.
	481
	482	=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code
	483
	484	Discussed briefly in L<perlpod/"Formatting Codes">.
	485
	486	This code is unusual in that it should have no content. That is,
	487	a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether
	488	or not it complains, the I<potatoes> text should ignored.
	489
	490	=item C<LE<lt>nameE<gt>> -- a hyperlink
	491
	492	The complicated syntaxes of this code are discussed at length in
	493	L<perlpod/"Formatting Codes">, and implementation details are
	494	discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the
	495	contents of LE<lt>content> is tricky. Notably, the content has to be
	496	checked for whether it looks like a URL, or whether it has to be split
	497	on literal "\|" and/or "/" (in the right order!), and so on,
	498	I<before> EE<lt>...> codes are resolved.
	499
	500	=item C<EE<lt>escapeE<gt>> -- a character escape
	501
	502	See L<perlpod/"Formatting Codes">, and several points in
	503	L</Notes on Implementing Pod Processors>.
	504
	505	=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces
	506
	507	This formatting code is syntactically simple, but semantically
	508	complex. What it means is that each space in the printable
	509	content of this code signifies a non-breaking space.
	510
	511	Consider:
	512
	513	C<$x ? $y : $z>
	514
	515	S<C<$x ? $y : $z>>
	516
	517	Both signify the monospace (c[ode] style) text consisting of
	518	"$x", one space, "?", one space, ":", one space, "$z". The
	519	difference is that in the latter, with the S code, those spaces
	520	are not "normal" spaces, but instead are non-breaking spaces.
	521
	522	=back
	523
	524
	525	If a Pod processor sees any formatting code other than the ones
	526	listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that
	527	processor must by default treat this as an error.
	528	A Pod parser may allow a way for particular
	529	applications to add to the above list of known formatting codes;
	530	a Pod parser might even allow a way to stipulate, for each additional
	531	command, whether it requires some form of special processing, as
	532	LE<lt>...> does.
	533
	534	Future versions of this specification may add additional
	535	formatting codes.
	536
	537	Historical note: A few older Pod processors would not see a ">" as
	538	closing a "CE<lt>" code, if the ">" was immediately preceded by
	539	a "-". This was so that this:
	540
	541	C<$foo->bar>
	542
	543	would parse as equivalent to this:
	544
	545	C<$foo-E<gt>bar>
	546
	547	instead of as equivalent to a "C" formatting code containing
	548	only "$foo-", and then a "bar>" outside the "C" formatting code. This
	549	problem has since been solved by the addition of syntaxes like this:
	550
	551	C<< $foo->bar >>
	552
	553	Compliant parsers must not treat "->" as special.
	554
	555	Formatting codes absolutely cannot span paragraphs. If a code is
	556	opened in one paragraph, and no closing code is found by the end of
	557	that paragraph, the Pod parser must close that formatting code,
	558	and should complain (as in "Unterminated I code in the paragraph
	559	starting at line 123: 'Time objects are not...'"). So these
	560	two paragraphs:
	561
	562	I<I told you not to do this!
	563
	564	Don't make me say it again!>
	565
	566	...must I<not> be parsed as two paragraphs in italics (with the I
	567	code starting in one paragraph and starting in another.) Instead,
	568	the first paragraph should generate a warning, but that aside, the
	569	above code must parse as if it were:
	570
	571	I<I told you not to do this!>
	572
	573	Don't make me say it again!E<gt>
	574
	575	(In SGMLish jargon, all Pod commands are like block-level
	576	elements, whereas all Pod formatting codes are like inline-level
	577	elements.)
	578
	579
	580
	581	=head1 Notes on Implementing Pod Processors
	582
	583	The following is a long section of miscellaneous requirements
	584	and suggestions to do with Pod processing.
	585
	586	=over
	587
	588	=item *
	589
	590	Pod formatters should tolerate lines in verbatim blocks that are of
	591	any length, even if that means having to break them (possibly several
	592	times, for very long lines) to avoid text running off the side of the
	593	page. Pod formatters may warn of such line-breaking. Such warnings
	594	are particularly appropriate for lines are over 100 characters long, which
	595	are usually not intentional.
	596
	597	=item *
	598
	599	Pod parsers must recognize I<all> of the three well-known newline
	600	formats: CR, LF, and CRLF. See L<perlport\|perlport>.
	601
	602	=item *
	603
	604	Pod parsers should accept input lines that are of any length.
	605
	606	=item *
	607
	608	Since Perl recognizes a Unicode Byte Order Mark at the start of files
	609	as signaling that the file is Unicode encoded as in UTF-16 (whether
	610	big-endian or little-endian) or UTF-8, Pod parsers should do the
	611	same. Otherwise, the character encoding should be understood as
	612	being UTF-8 if the first highbit byte sequence in the file seems
	613	valid as a UTF-8 sequence, or otherwise as CP-1252 (earlier versions of
	614	this specification used Latin-1 instead of CP-1252).
	615
	616	Future versions of this specification may specify
	617	how Pod can accept other encodings. Presumably treatment of other
	618	encodings in Pod parsing would be as in XML parsing: whatever the
	619	encoding declared by a particular Pod file, content is to be
	620	stored in memory as Unicode characters.
	621
	622	=item *
	623
	624	The well known Unicode Byte Order Marks are as follows: if the
	625	file begins with the two literal byte values 0xFE 0xFF, this is
	626	the BOM for big-endian UTF-16. If the file begins with the two
	627	literal byte value 0xFF 0xFE, this is the BOM for little-endian
	628	UTF-16. On an ASCII platform, if the file begins with the three literal
	629	byte values
	630	0xEF 0xBB 0xBF, this is the BOM for UTF-8.
	631	A mechanism portable to EBCDIC platforms is to:
	632
	633	my $utf8_bom = "\x{FEFF}";
	634	utf8::encode($utf8_bom);
	635
	636	=for comment
	637	use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}";
	638	0xEF 0xBB 0xBF
	639
	640	=for comment
	641	If toke.c is modified to support UTF-32, add mention of those here.
	642
	643	=item *
	644
	645	A naive, but often sufficient heuristic on ASCII platforms, for testing
	646	the first highbit
	647	byte-sequence in a BOM-less file (whether in code or in Pod!), to see
	648	whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
	649	that the first byte in the sequence is in the range 0xC2 - 0xFD
	650	I<and> whether the next byte is in the range
	651	0x80 - 0xBF. If so, the parser may conclude that this file is in
	652	UTF-8, and all highbit sequences in the file should be assumed to
	653	be UTF-8. Otherwise the parser should treat the file as being
	654	in CP-1252. (A better check, and which works on EBCDIC platforms as
	655	well, is to pass a copy of the sequence to
	656	L<utf8::decode()\|utf8> which performs a full validity check on the
	657	sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This
	658	function is always pre-loaded, is fast because it is written in C, and
	659	will only get called at most once, so you don't need to avoid it out of
	660	performance concerns.)
	661	In the unlikely circumstance that the first highbit
	662	sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one
	663	can cater to our heuristic (as well as any more intelligent heuristic)
	664	by prefacing that line with a comment line containing a highbit
	665	sequence that is clearly I<not> valid as UTF-8. A line consisting
	666	of simply "#", an e-acute, and any non-highbit byte,
	667	is sufficient to establish this file's encoding.
	668
	669	=for comment
	670	If/WHEN some brave soul makes these heuristics into a generic
	671	text-file class (or PerlIO layer?), we can presumably delete
	672	mention of these icky details from this file, and can instead
	673	tell people to just use appropriate class/layer.
	674	Auto-recognition of newline sequences would be another desirable
	675	feature of such a class/layer.
	676	HINT HINT HINT.
	677
	678	=for comment
	679	"The probability that a string of characters
	680	in any other encoding appears as valid UTF-8 is low" - RFC2279
	681
	682	=item *
	683
	684	Pod processors must treat a "=for [label] [content...]" paragraph as
	685	meaning the same thing as a "=begin [label]" paragraph, content, and
	686	an "=end [label]" paragraph. (The parser may conflate these two
	687	constructs, or may leave them distinct, in the expectation that the
	688	formatter will nevertheless treat them the same.)
	689
	690	=item *
	691
	692	When rendering Pod to a format that allows comments (i.e., to nearly
	693	any format other than plaintext), a Pod formatter must insert comment
	694	text identifying its name and version number, and the name and
	695	version numbers of any modules it might be using to process the Pod.
	696	Minimal examples:
	697
	698	%% POD::Pod2PS v3.14159, using POD::Parser v1.92
	699
	700	<!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
	701
	702	{\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
	703
	704	.\" Pod::Man version 3.14159, using POD::Parser version 1.92
	705
	706	Formatters may also insert additional comments, including: the
	707	release date of the Pod formatter program, the contact address for
	708	the author(s) of the formatter, the current time, the name of input
	709	file, the formatting options in effect, version of Perl used, etc.
	710
	711	Formatters may also choose to note errors/warnings as comments,
	712	besides or instead of emitting them otherwise (as in messages to
	713	STDERR, or C<die>ing).
	714
	715	=item *
	716
	717	Pod parsers I<may> emit warnings or error messages ("Unknown E code
	718	EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or
	719	C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow
	720	suppressing all such STDERR output, and instead allow an option for
	721	reporting errors/warnings
	722	in some other way, whether by triggering a callback, or noting errors
	723	in some attribute of the document object, or some similarly unobtrusive
	724	mechanism -- or even by appending a "Pod Errors" section to the end of
	725	the parsed form of the document.
	726
	727	=item *
	728
	729	In cases of exceptionally aberrant documents, Pod parsers may abort the
	730	parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where
	731	possible, the parser library may simply close the input file
	732	and add text like "* Formatting Aborted *" to the end of the
	733	(partial) in-memory document.
	734
	735	=item *
	736
	737	In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>)
	738	are understood (i.e., I<not> verbatim paragraphs, but I<including>
	739	ordinary paragraphs, and command paragraphs that produce renderable
	740	text, like "=head1"), literal whitespace should generally be considered
	741	"insignificant", in that one literal space has the same meaning as any
	742	(nonzero) number of literal spaces, literal newlines, and literal tabs
	743	(as long as this produces no blank lines, since those would terminate
	744	the paragraph). Pod parsers should compact literal whitespace in each
	745	processed paragraph, but may provide an option for overriding this
	746	(since some processing tasks do not require it), or may follow
	747	additional special rules (for example, specially treating
	748	period-space-space or period-newline sequences).
	749
	750	=item *
	751
	752	Pod parsers should not, by default, try to coerce apostrophe (') and
	753	quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
	754	turn backtick (`) into anything else but a single backtick character
	755	(distinct from an open quote character!), nor "--" into anything but
	756	two minus signs. They I<must never> do any of those things to text
	757	in CE<lt>...> formatting codes, and never I<ever> to text in verbatim
	758	paragraphs.
	759
	760	=item *
	761
	762	When rendering Pod to a format that has two kinds of hyphens (-), one
	763	that's a non-breaking hyphen, and another that's a breakable hyphen
	764	(as in "object-oriented", which can be split across lines as
	765	"object-", newline, "oriented"), formatters are encouraged to
	766	generally translate "-" to non-breaking hyphen, but may apply
	767	heuristics to convert some of these to breaking hyphens.
	768
	769	=item *
	770
	771	Pod formatters should make reasonable efforts to keep words of Perl
	772	code from being broken across lines. For example, "Foo::Bar" in some
	773	formatting systems is seen as eligible for being broken across lines
	774	as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should
	775	be avoided where possible, either by disabling all line-breaking in
	776	mid-word, or by wrapping particular words with internal punctuation
	777	in "don't break this across lines" codes (which in some formats may
	778	not be a single code, but might be a matter of inserting non-breaking
	779	zero-width spaces between every pair of characters in a word.)
	780
	781	=item *
	782
	783	Pod parsers should, by default, expand tabs in verbatim paragraphs as
	784	they are processed, before passing them to the formatter or other
	785	processor. Parsers may also allow an option for overriding this.
	786
	787	=item *
	788
	789	Pod parsers should, by default, remove newlines from the end of
	790	ordinary and verbatim paragraphs before passing them to the
	791	formatter. For example, while the paragraph you're reading now
	792	could be considered, in Pod source, to end with (and contain)
	793	the newline(s) that end it, it should be processed as ending with
	794	(and containing) the period character that ends this sentence.
	795
	796	=item *
	797
	798	Pod parsers, when reporting errors, should make some effort to report
	799	an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near
	800	line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
	801	number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where
	802	this is problematic, the paragraph number should at least be
	803	accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in
	804	Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
	805	the CE<lt>interest rate> attribute...'").
	806
	807	=item *
	808
	809	Pod parsers, when processing a series of verbatim paragraphs one
	810	after another, should consider them to be one large verbatim
	811	paragraph that happens to contain blank lines. I.e., these two
	812	lines, which have a blank line between them:
	813
	814	use Foo;
	815
	816	print Foo->VERSION
	817
	818	should be unified into one paragraph ("\tuse Foo;\n\n\tprint
	819	Foo->VERSION") before being passed to the formatter or other
	820	processor. Parsers may also allow an option for overriding this.
	821
	822	While this might be too cumbersome to implement in event-based Pod
	823	parsers, it is straightforward for parsers that return parse trees.
	824
	825	=item *
	826
	827	Pod formatters, where feasible, are advised to avoid splitting short
	828	verbatim paragraphs (under twelve lines, say) across pages.
	829
	830	=item *
	831
	832	Pod parsers must treat a line with only spaces and/or tabs on it as a
	833	"blank line" such as separates paragraphs. (Some older parsers
	834	recognized only two adjacent newlines as a "blank line" but would not
	835	recognize a newline, a space, and a newline, as a blank line. This
	836	is noncompliant behavior.)
	837
	838	=item *
	839
	840	Authors of Pod formatters/processors should make every effort to
	841	avoid writing their own Pod parser. There are already several in
	842	CPAN, with a wide range of interface styles -- and one of them,
	843	Pod::Simple, comes with modern versions of Perl.
	844
	845	=item *
	846
	847	Characters in Pod documents may be conveyed either as literals, or by
	848	number in EE<lt>n> codes, or by an equivalent mnemonic, as in
	849	EE<lt>eacute> which is exactly equivalent to EE<lt>233>. The numbers
	850	are the Latin1/Unicode values, even on EBCDIC platforms.
	851
	852	When referring to characters by using a EE<lt>n> numeric code, numbers
	853	in the range 32-126 refer to those well known US-ASCII characters (also
	854	defined there by Unicode, with the same meaning), which all Pod
	855	formatters must render faithfully. Characters whose EE<lt>E<gt> numbers
	856	are in the ranges 0-31 and 127-159 should not be used (neither as
	857	literals,
	858	nor as EE<lt>number> codes), except for the literal byte-sequences for
	859	newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9).
	860
	861	Numbers in the range 160-255 refer to Latin-1 characters (also
	862	defined there by Unicode, with the same meaning). Numbers above
	863	255 should be understood to refer to Unicode characters.
	864
	865	=item *
	866
	867	Be warned
	868	that some formatters cannot reliably render characters outside 32-126;
	869	and many are able to handle 32-126 and 160-255, but nothing above
	870	255.
	871
	872	=item *
	873
	874	Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for
	875	less-than and greater-than, Pod parsers must understand "EE<lt>sol>"
	876	for "/" (solidus, slash), and "EE<lt>verbar>" for "\|" (vertical bar,
	877	pipe). Pod parsers should also understand "EE<lt>lchevron>" and
	878	"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e.,
	879	"left-pointing double angle quotation mark" = "left pointing
	880	guillemet" and "right-pointing double angle quotation mark" = "right
	881	pointing guillemet". (These look like little "<<" and ">>", and they
	882	are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>"
	883	and "EE<lt>raquo>".)
	884
	885	=item *
	886
	887	Pod parsers should understand all "EE<lt>html>" codes as defined
	888	in the entity declarations in the most recent XHTML specification at
	889	C<www.W3.org>. Pod parsers must understand at least the entities
	890	that define characters in the range 160-255 (Latin-1). Pod parsers,
	891	when faced with some unknown "EE<lt>I<identifier>>" code,
	892	shouldn't simply replace it with nullstring (by default, at least),
	893	but may pass it through as a string consisting of the literal characters
	894	E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the
	895	alternative option of processing such unknown
	896	"EE<lt>I<identifier>>" codes by firing an event especially
	897	for such codes, or by adding a special node-type to the in-memory
	898	document tree. Such "EE<lt>I<identifier>>" may have special meaning
	899	to some processors, or some processors may choose to add them to
	900	a special error report.
	901
	902	=item *
	903
	904	Pod parsers must also support the XHTML codes "EE<lt>quot>" for
	905	character 34 (doublequote, "), "EE<lt>amp>" for character 38
	906	(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, ').
	907
	908	=item *
	909
	910	Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether
	911	an htmlname, or a number in any base) must consist only of
	912	alphanumeric characters -- that is, I<whatever> must match
	913	C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because
	914	it contains spaces, which aren't alphanumeric characters. This
	915	presumably does not I<need> special treatment by a Pod processor;
	916	S<" 0 1 2 3 "> doesn't look like a number in any base, so it would
	917	presumably be looked up in the table of HTML-like names. Since
	918	there isn't (and cannot be) an HTML-like entity called S<" 0 1 2 3 ">,
	919	this will be treated as an error. However, Pod processors may
	920	treat S<"EE<lt> 0 1 2 3 E<gt>"> or "EE<lt>e-acute>" as I<syntactically>
	921	invalid, potentially earning a different error message than the
	922	error message (or warning, or event) generated by a merely unknown
	923	(but theoretically valid) htmlname, as in "EE<lt>qacute>"
	924	[sic]. However, Pod parsers are not required to make this
	925	distinction.
	926
	927	=item *
	928
	929	Note that EE<lt>number> I<must not> be interpreted as simply
	930	"codepoint I<number> in the current/native character set". It always
	931	means only "the character represented by codepoint I<number> in
	932	Unicode." (This is identical to the semantics of &#I<number>; in XML.)
	933
	934	This will likely require many formatters to have tables mapping from
	935	treatable Unicode codepoints (such as the "\xE9" for the e-acute
	936	character) to the escape sequences or codes necessary for conveying
	937	such sequences in the target output format. A converter to *roff
	938	would, for example know that "\xE9" (whether conveyed literally, or via
	939	a EE<lt>...> sequence) is to be conveyed as "e\\*'".
	940	Similarly, a program rendering Pod in a Mac OS application window, would
	941	presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
	942	encoding that (at time of writing) is native for Mac OS. Such
	943	Unicode2whatever mappings are presumably already widely available for
	944	common output formats. (Such mappings may be incomplete! Implementers
	945	are not expected to bend over backwards in an attempt to render
	946	Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
	947	of the other weird things that Unicode can encode.) And
	948	if a Pod document uses a character not found in such a mapping, the
	949	formatter should consider it an unrenderable character.
	950
	951	=item *
	952
	953	If, surprisingly, the implementor of a Pod formatter can't find a
	954	satisfactory pre-existing table mapping from Unicode characters to
	955	escapes in the target format (e.g., a decent table of Unicode
	956	characters to *roff escapes), it will be necessary to build such a
	957	table. If you are in this circumstance, you should begin with the
	958	characters in the range 0x00A0 - 0x00FF, which is mostly the heavily
	959	used accented characters. Then proceed (as patience permits and
	960	fastidiousness compels) through the characters that the (X)HTML
	961	standards groups judged important enough to merit mnemonics
	962	for. These are declared in the (X)HTML specifications at the
	963	www.W3.org site. At time of writing (September 2001), the most recent
	964	entity declaration files are:
	965
	966	http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
	967	http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
	968	http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
	969
	970	Then you can progress through any remaining notable Unicode characters
	971	in the range 0x2000-0x204D (consult the character tables at
	972	www.unicode.org), and whatever else strikes your fancy. For example,
	973	in F<xhtml-symbol.ent>, there is the entry:
	974
	975	<!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech -->
	976
	977	While the mapping "infin" to the character "\x{221E}" will (hopefully)
	978	have been already handled by the Pod parser, the presence of the
	979	character in this file means that it's reasonably important enough to
	980	include in a formatter's table that maps from notable Unicode characters
	981	to the codes necessary for rendering them. So for a Unicode-to-*roff
	982	mapping, for example, this would merit the entry:
	983
	984	"\x{221E}" => '\(in',
	985
	986	It is eagerly hoped that in the future, increasing numbers of formats
	987	(and formatters) will support Unicode characters directly (as (X)HTML
	988	does with C<∞>, C<∞>, or C<∞>), reducing the need
	989	for idiosyncratic mappings of Unicode-to-I<my_escapes>.
	990
	991	=item *
	992
	993	It is up to individual Pod formatter to display good judgement when
	994	confronted with an unrenderable character (which is distinct from an
	995	unknown EE<lt>thing> sequence that the parser couldn't resolve to
	996	anything, renderable or not). It is good practice to map Latin letters
	997	with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding
	998	unaccented US-ASCII letters (like a simple character 101, "e"), but
	999	clearly this is often not feasible, and an unrenderable character may
	1000	be represented as "?", or the like. In attempting a sane fallback
	1001	(as from EE<lt>233> to "e"), Pod formatters may use the
	1002	%Latin1Code_to_fallback table in L<Pod::Escapes\|Pod::Escapes>, or
	1003	L<Text::Unidecode\|Text::Unidecode>, if available.
	1004
	1005	For example, this Pod text:
	1006
	1007	magic is enabled if you set C<$Currency> to 'E<euro>'.
	1008
	1009	may be rendered as:
	1010	"magic is enabled if you set C<$Currency> to 'I<?>'" or as
	1011	"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as
	1012	"magic is enabled if you set C<$Currency> to '[x20AC]', etc.
	1013
	1014	A Pod formatter may also note, in a comment or warning, a list of what
	1015	unrenderable characters were encountered.
	1016
	1017	=item *
	1018
	1019	EE<lt>...> may freely appear in any formatting code (other than
	1020	in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The
	1021	EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The
	1022	EE<lt>euro>1,000,000 Solution\|Million::Euros>".
	1023
	1024	=item *
	1025
	1026	Some Pod formatters output to formats that implement non-breaking
	1027	spaces as an individual character (which I'll call "NBSP"), and
	1028	others output to formats that implement non-breaking spaces just as
	1029	spaces wrapped in a "don't break this across lines" code. Note that
	1030	at the level of Pod, both sorts of codes can occur: Pod can contain a
	1031	NBSP character (whether as a literal, or as a "EE<lt>160>" or
	1032	"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
	1033	IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
	1034	such codes are taken to represent non-breaking spaces. Pod
	1035	parsers should consider supporting the optional parsing of "SE<lt>foo
	1036	IE<lt>barE<gt> baz>" as if it were
	1037	"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
	1038	optional parsing of groups of words joined by NBSP's as if each group
	1039	were in a SE<lt>...> code, so that formatters may use the
	1040	representation that maps best to what the output format demands.
	1041
	1042	=item *
	1043
	1044	Some processors may find that the C<SE<lt>...E<gt>> code is easiest to
	1045	implement by replacing each space in the parse tree under the content
	1046	of the S, with an NBSP. But note: the replacement should apply I<not> to
	1047	spaces in I<all> text, but I<only> to spaces in I<printable> text. (This
	1048	distinction may or may not be evident in the particular tree/event
	1049	model implemented by the Pod parser.) For example, consider this
	1050	unusual case:
	1051
	1052	S<L</Autoloaded Functions>>
	1053
	1054	This means that the space in the middle of the visible link text must
	1055	not be broken across lines. In other words, it's the same as this:
	1056
	1057	L<"AutoloadedE<160>Functions"/Autoloaded Functions>
	1058
	1059	However, a misapplied space-to-NBSP replacement could (wrongly)
	1060	produce something equivalent to this:
	1061
	1062	L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
	1063
	1064	...which is almost definitely not going to work as a hyperlink (assuming
	1065	this formatter outputs a format supporting hypertext).
	1066
	1067	Formatters may choose to just not support the S format code,
	1068	especially in cases where the output format simply has no NBSP
	1069	character/code and no code for "don't break this stuff across lines".
	1070
	1071	=item *
	1072
	1073	Besides the NBSP character discussed above, implementors are reminded
	1074	of the existence of the other "special" character in Latin-1, the
	1075	"soft hyphen" character, also known as "discretionary hyphen",
	1076	i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> =
	1077	C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation
	1078	point. That is, it normally renders as nothing, but may render as a
	1079	"-" if a formatter breaks the word at that point. Pod formatters
	1080	should, as appropriate, do one of the following: 1) render this with
	1081	a code with the same meaning (e.g., "\-" in RTF), 2) pass it through
	1082	in the expectation that the formatter understands this character as
	1083	such, or 3) delete it.
	1084
	1085	For example:
	1086
	1087	sigE<shy>action
	1088	manuE<shy>script
	1089	JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
	1090
	1091	These signal to a formatter that if it is to hyphenate "sigaction"
	1092	or "manuscript", then it should be done as
	1093	"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script"
	1094	(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't
	1095	show up at all). And if it is
	1096	to hyphenate "Jarkko" and/or "Hietaniemi", it can do
	1097	so only at the points where there is a C<EE<lt>shyE<gt>> code.
	1098
	1099	In practice, it is anticipated that this character will not be used
	1100	often, but formatters should either support it, or delete it.
	1101
	1102	=item *
	1103
	1104	If you think that you want to add a new command to Pod (like, say, a
	1105	"=biblio" command), consider whether you could get the same
	1106	effect with a for or begin/end sequence: "=for biblio ..." or "=begin
	1107	biblio" ... "=end biblio". Pod processors that don't understand
	1108	"=for biblio", etc, will simply ignore it, whereas they may complain
	1109	loudly if they see "=biblio".
	1110
	1111	=item *
	1112
	1113	Throughout this document, "Pod" has been the preferred spelling for
	1114	the name of the documentation format. One may also use "POD" or
	1115	"pod". For the documentation that is (typically) in the Pod
	1116	format, you may use "pod", or "Pod", or "POD". Understanding these
	1117	distinctions is useful; but obsessing over how to spell them, usually
	1118	is not.
	1119
	1120	=back
	1121
	1122
	1123
	1124
	1125
	1126	=head1 About LE<lt>...E<gt> Codes
	1127
	1128	As you can tell from a glance at L<perlpod\|perlpod>, the LE<lt>...>
	1129	code is the most complex of the Pod formatting codes. The points below
	1130	will hopefully clarify what it means and how processors should deal
	1131	with it.
	1132
	1133	=over
	1134
	1135	=item *
	1136
	1137	In parsing an LE<lt>...> code, Pod parsers must distinguish at least
	1138	four attributes:
	1139
	1140	=over
	1141
	1142	=item First:
	1143
	1144	The link-text. If there is none, this must be C<undef>. (E.g., in
	1145	"LE<lt>Perl Functions\|perlfunc>", the link-text is "Perl Functions".
	1146	In "LE<lt>Time::HiRes>" and even "LE<lt>\|Time::HiRes>", there is no
	1147	link text. Note that link text may contain formatting.)
	1148
	1149	=item Second:
	1150
	1151	The possibly inferred link-text; i.e., if there was no real link
	1152	text, then this is the text that we'll infer in its place. (E.g., for
	1153	"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".)
	1154
	1155	=item Third:
	1156
	1157	The name or URL, or C<undef> if none. (E.g., in "LE<lt>Perl
	1158	Functions\|perlfunc>", the name (also sometimes called the page)
	1159	is "perlfunc". In "LE<lt>/CAVEATS>", the name is C<undef>.)
	1160
	1161	=item Fourth:
	1162
	1163	The section (AKA "item" in older perlpods), or C<undef> if none. E.g.,
	1164	in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note
	1165	that this is not the same as a manpage section like the "5" in "man 5
	1166	crontab". "Section Foo" in the Pod sense means the part of the text
	1167	that's introduced by the heading or item whose text is "Foo".)
	1168
	1169	=back
	1170
	1171	Pod parsers may also note additional attributes including:
	1172
	1173	=over
	1174
	1175	=item Fifth:
	1176
	1177	A flag for whether item 3 (if present) is a URL (like
	1178	"http://lists.perl.org" is), in which case there should be no section
	1179	attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or
	1180	possibly a man page name (like "crontab(5)" is).
	1181
	1182	=item Sixth:
	1183
	1184	The raw original LE<lt>...> content, before text is split on
	1185	"\|", "/", etc, and before EE<lt>...> codes are expanded.
	1186
	1187	=back
	1188
	1189	(The above were numbered only for concise reference below. It is not
	1190	a requirement that these be passed as an actual list or array.)
	1191
	1192	For example:
	1193
	1194	L<Foo::Bar>
	1195	=> undef, # link text
	1196	"Foo::Bar", # possibly inferred link text
	1197	"Foo::Bar", # name
	1198	undef, # section
	1199	'pod', # what sort of link
	1200	"Foo::Bar" # original content
	1201
	1202	L<Perlport's section on NL's\|perlport/Newlines>
	1203	=> "Perlport's section on NL's", # link text
	1204	"Perlport's section on NL's", # possibly inferred link text
	1205	"perlport", # name
	1206	"Newlines", # section
	1207	'pod', # what sort of link
	1208	"Perlport's section on NL's\|perlport/Newlines"
	1209	# original content
	1210
	1211	L<perlport/Newlines>
	1212	=> undef, # link text
	1213	'"Newlines" in perlport', # possibly inferred link text
	1214	"perlport", # name
	1215	"Newlines", # section
	1216	'pod', # what sort of link
	1217	"perlport/Newlines" # original content
	1218
	1219	L<crontab(5)/"DESCRIPTION">
	1220	=> undef, # link text
	1221	'"DESCRIPTION" in crontab(5)', # possibly inferred link text
	1222	"crontab(5)", # name
	1223	"DESCRIPTION", # section
	1224	'man', # what sort of link
	1225	'crontab(5)/"DESCRIPTION"' # original content
	1226
	1227	L</Object Attributes>
	1228	=> undef, # link text
	1229	'"Object Attributes"', # possibly inferred link text
	1230	undef, # name
	1231	"Object Attributes", # section
	1232	'pod', # what sort of link
	1233	"/Object Attributes" # original content
	1234
	1235	L<https://www.perl.org/>
	1236	=> undef, # link text
	1237	"https://www.perl.org/", # possibly inferred link text
	1238	"https://www.perl.org/", # name
	1239	undef, # section
	1240	'url', # what sort of link
	1241	"https://www.perl.org/" # original content
	1242
	1243	L<Perl.org\|https://www.perl.org/>
	1244	=> "Perl.org", # link text
	1245	"https://www.perl.org/", # possibly inferred link text
	1246	"https://www.perl.org/", # name
	1247	undef, # section
	1248	'url', # what sort of link
	1249	"Perl.org\|https://www.perl.org/" # original content
	1250
	1251	Note that you can distinguish URL-links from anything else by the
	1252	fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So
	1253	C<LE<lt>http://www.perl.comE<gt>> is a URL, but
	1254	C<LE<lt>HTTP::ResponseE<gt>> isn't.
	1255
	1256	=item *
	1257
	1258	In case of LE<lt>...> codes with no "text\|" part in them,
	1259	older formatters have exhibited great variation in actually displaying
	1260	the link or cross reference. For example, LE<lt>crontab(5)> would render
	1261	as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage"
	1262	or just "C<crontab(5)>".
	1263
	1264	Pod processors must now treat "text\|"-less links as follows:
	1265
	1266	L<name> => L<name\|name>
	1267	L</section> => L<"section"\|/section>
	1268	L<name/section> => L<"section" in name\|name/section>
	1269
	1270	=item *
	1271
	1272	Note that section names might contain markup. I.e., if a section
	1273	starts with:
	1274
	1275	=head2 About the C<-M> Operator
	1276
	1277	or with:
	1278
	1279	=item About the C<-M> Operator
	1280
	1281	then a link to it would look like this:
	1282
	1283	L<somedoc/About the C<-M> Operator>
	1284
	1285	Formatters may choose to ignore the markup for purposes of resolving
	1286	the link and use only the renderable characters in the section name,
	1287	as in:
	1288
	1289	<h1><a name="About_the_-M_Operator">About the <code>-M</code>
	1290	Operator</h1>
	1291
	1292	...
	1293
	1294	<a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
	1295	Operator" in somedoc</a>
	1296
	1297	=item *
	1298
	1299	Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>>
	1300	links from C<LE<lt>name/itemE<gt>> links (and their targets). These
	1301	have been merged syntactically and semantically in the current
	1302	specification, and I<section> can refer either to a "=headI<n> Heading
	1303	Content" command or to a "=item Item Content" command. This
	1304	specification does not specify what behavior should be in the case
	1305	of a given document having several things all seeming to produce the
	1306	same I<section> identifier (e.g., in HTML, several things all producing
	1307	the same I<anchorname> in <a name="I<anchorname>">...</a>
	1308	elements). Where Pod processors can control this behavior, they should
	1309	use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the
	1310	I<first> "Bar" section in Foo.
	1311
	1312	But for some processors/formats this cannot be easily controlled; as
	1313	with the HTML example, the behavior of multiple ambiguous
	1314	<a name="I<anchorname>">...</a> is most easily just left up to
	1315	browsers to decide.
	1316
	1317	=item *
	1318
	1319	In a C<LE<lt>text\|...E<gt>> code, text may contain formatting codes
	1320	for formatting or for EE<lt>...> escapes, as in:
	1321
	1322	L<B<ummE<234>stuff>\|...>
	1323
	1324	For C<LE<lt>...E<gt>> codes without a "name\|" part, only
	1325	C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is,
	1326	authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>".
	1327
	1328	Note, however, that formatting codes and ZE<lt>>'s can occur in any
	1329	and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>,
	1330	and I<url>).
	1331
	1332	Authors must not nest LE<lt>...> codes. For example, "LE<lt>The
	1333	LE<lt>Foo::Bar> man page>" should be treated as an error.
	1334
	1335	=item *
	1336
	1337	Note that Pod authors may use formatting codes inside the "text"
	1338	part of "LE<lt>text\|name>" (and so on for LE<lt>text\|/"sec">).
	1339
	1340	In other words, this is valid:
	1341
	1342	Go read L<the docs on C<$.>\|perlvar/"$.">
	1343
	1344	Some output formats that do allow rendering "LE<lt>...>" codes as
	1345	hypertext, might not allow the link-text to be formatted; in
	1346	that case, formatters will have to just ignore that formatting.
	1347
	1348	=item *
	1349
	1350	At time of writing, C<LE<lt>nameE<gt>> values are of two types:
	1351	either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which
	1352	might be a real Perl module or program in an @INC / PATH
	1353	directory, or a .pod file in those places); or the name of a Unix
	1354	man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>>
	1355	is ambiguous between a Pod page called "chmod", or the Unix man page
	1356	"chmod" (in whatever man-section). However, the presence of a string
	1357	in parens, as in "crontab(5)", is sufficient to signal that what
	1358	is being discussed is not a Pod page, and so is presumably a
	1359	Unix man page. The distinction is of no importance to many
	1360	Pod processors, but some processors that render to hypertext formats
	1361	may need to distinguish them in order to know how to render a
	1362	given C<LE<lt>fooE<gt>> code.
	1363
	1364	=item *
	1365
	1366	Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in
	1367	C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from
	1368	C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only
	1369	slightly less ambiguous. This syntax is no longer in the specification, and
	1370	has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was
	1371	formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>>
	1372	syntax, for a while at least. The suggested heuristic for distinguishing
	1373	C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any
	1374	whitespace, it's a I<section>. Pod processors should warn about this being
	1375	deprecated syntax.
	1376
	1377	=back
	1378
	1379	=head1 About =over...=back Regions
	1380
	1381	"=over"..."=back" regions are used for various kinds of list-like
	1382	structures. (I use the term "region" here simply as a collective
	1383	term for everything from the "=over" to the matching "=back".)
	1384
	1385	=over
	1386
	1387	=item *
	1388
	1389	The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ...
	1390	"=back" is used for giving the formatter a clue as to how many
	1391	"spaces" (ems, or roughly equivalent units) it should tab over,
	1392	although many formatters will have to convert this to an absolute
	1393	measurement that may not exactly match with the size of spaces (or M's)
	1394	in the document's base font. Other formatters may have to completely
	1395	ignore the number. The lack of any explicit I<indentlevel> parameter is
	1396	equivalent to an I<indentlevel> value of 4. Pod processors may
	1397	complain if I<indentlevel> is present but is not a positive number
	1398	matching C<m/\A(\d*\.)?\d+\z/>.
	1399
	1400	=item *
	1401
	1402	Authors of Pod formatters are reminded that "=over" ... "=back" may
	1403	map to several different constructs in your output format. For
	1404	example, in converting Pod to (X)HTML, it can map to any of
	1405	<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
	1406	<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or
	1407	<dt>.
	1408
	1409	=item *
	1410
	1411	Each "=over" ... "=back" region should be one of the following:
	1412
	1413	=over
	1414
	1415	=item *
	1416
	1417	An "=over" ... "=back" region containing only "=item *" commands,
	1418	each followed by some number of ordinary/verbatim paragraphs, other
	1419	nested "=over" ... "=back" regions, "=for..." paragraphs, and
	1420	"=begin"..."=end" regions.
	1421
	1422	(Pod processors must tolerate a bare "=item" as if it were "=item
	1423	".) Whether "" is rendered as a literal asterisk, an "o", or as
	1424	some kind of real bullet character, is left up to the Pod formatter,
	1425	and may depend on the level of nesting.
	1426
	1427	=item *
	1428
	1429	An "=over" ... "=back" region containing only
	1430	C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them)
	1431	followed by some number of ordinary/verbatim paragraphs, other nested
	1432	"=over" ... "=back" regions, "=for..." paragraphs, and/or
	1433	"=begin"..."=end" codes. Note that the numbers must start at 1
	1434	in each section, and must proceed in order and without skipping
	1435	numbers.
	1436
	1437	(Pod processors must tolerate lines like "=item 1" as if they were
	1438	"=item 1.", with the period.)
	1439
	1440	=item *
	1441
	1442	An "=over" ... "=back" region containing only "=item [text]"
	1443	commands, each one (or each group of them) followed by some number of
	1444	ordinary/verbatim paragraphs, other nested "=over" ... "=back"
	1445	regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
	1446
	1447	The "=item [text]" paragraph should not match
	1448	C<m/\A=item\s+\d+\.?\s\z/> or C<m/\A=item\s+\\s*\z/>, nor should it
	1449	match just C<m/\A=item\s*\z/>.
	1450
	1451	=item *
	1452
	1453	An "=over" ... "=back" region containing no "=item" paragraphs at
	1454	all, and containing only some number of
	1455	ordinary/verbatim paragraphs, and possibly also some nested "=over"
	1456	... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
	1457	regions. Such an itemless "=over" ... "=back" region in Pod is
	1458	equivalent in meaning to a "<blockquote>...</blockquote>" element in
	1459	HTML.
	1460
	1461	=back
	1462
	1463	Note that with all the above cases, you can determine which type of
	1464	"=over" ... "=back" you have, by examining the first (non-"=cut",
	1465	non-"=pod") Pod paragraph after the "=over" command.
	1466
	1467	=item *
	1468
	1469	Pod formatters I<must> tolerate arbitrarily large amounts of text
	1470	in the "=item I<text...>" paragraph. In practice, most such
	1471	paragraphs are short, as in:
	1472
	1473	=item For cutting off our trade with all parts of the world
	1474
	1475	But they may be arbitrarily long:
	1476
	1477	=item For transporting us beyond seas to be tried for pretended
	1478	offenses
	1479
	1480	=item He is at this time transporting large armies of foreign
	1481	mercenaries to complete the works of death, desolation and
	1482	tyranny, already begun with circumstances of cruelty and perfidy
	1483	scarcely paralleled in the most barbarous ages, and totally
	1484	unworthy the head of a civilized nation.
	1485
	1486	=item *
	1487
	1488	Pod processors should tolerate "=item *" / "=item I<number>" commands
	1489	with no accompanying paragraph. The middle item is an example:
	1490
	1491	=over
	1492
	1493	=item 1
	1494
	1495	Pick up dry cleaning.
	1496
	1497	=item 2
	1498
	1499	=item 3
	1500
	1501	Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
	1502
	1503	=back
	1504
	1505	=item *
	1506
	1507	No "=over" ... "=back" region can contain headings. Processors may
	1508	treat such a heading as an error.
	1509
	1510	=item *
	1511
	1512	Note that an "=over" ... "=back" region should have some
	1513	content. That is, authors should not have an empty region like this:
	1514
	1515	=over
	1516
	1517	=back
	1518
	1519	Pod processors seeing such a contentless "=over" ... "=back" region,
	1520	may ignore it, or may report it as an error.
	1521
	1522	=item *
	1523
	1524	Processors must tolerate an "=over" list that goes off the end of the
	1525	document (i.e., which has no matching "=back"), but they may warn
	1526	about such a list.
	1527
	1528	=item *
	1529
	1530	Authors of Pod formatters should note that this construct:
	1531
	1532	=item Neque
	1533
	1534	=item Porro
	1535
	1536	=item Quisquam Est
	1537
	1538	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1539	velit, sed quia non numquam eius modi tempora incidunt ut
	1540	labore et dolore magnam aliquam quaerat voluptatem.
	1541
	1542	=item Ut Enim
	1543
	1544	is semantically ambiguous, in a way that makes formatting decisions
	1545	a bit difficult. On the one hand, it could be mention of an item
	1546	"Neque", mention of another item "Porro", and mention of another
	1547	item "Quisquam Est", with just the last one requiring the explanatory
	1548	paragraph "Qui dolorem ipsum quia dolor..."; and then an item
	1549	"Ut Enim". In that case, you'd want to format it like so:
	1550
	1551	Neque
	1552
	1553	Porro
	1554
	1555	Quisquam Est
	1556	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1557	velit, sed quia non numquam eius modi tempora incidunt ut
	1558	labore et dolore magnam aliquam quaerat voluptatem.
	1559
	1560	Ut Enim
	1561
	1562	But it could equally well be a discussion of three (related or equivalent)
	1563	items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph
	1564	explaining them all, and then a new item "Ut Enim". In that case, you'd
	1565	probably want to format it like so:
	1566
	1567	Neque
	1568	Porro
	1569	Quisquam Est
	1570	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1571	velit, sed quia non numquam eius modi tempora incidunt ut
	1572	labore et dolore magnam aliquam quaerat voluptatem.
	1573
	1574	Ut Enim
	1575
	1576	But (for the foreseeable future), Pod does not provide any way for Pod
	1577	authors to distinguish which grouping is meant by the above
	1578	"=item"-cluster structure. So formatters should format it like so:
	1579
	1580	Neque
	1581
	1582	Porro
	1583
	1584	Quisquam Est
	1585
	1586	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1587	velit, sed quia non numquam eius modi tempora incidunt ut
	1588	labore et dolore magnam aliquam quaerat voluptatem.
	1589
	1590	Ut Enim
	1591
	1592	That is, there should be (at least roughly) equal spacing between
	1593	items as between paragraphs (although that spacing may well be less
	1594	than the full height of a line of text). This leaves it to the reader
	1595	to use (con)textual cues to figure out whether the "Qui dolorem
	1596	ipsum..." paragraph applies to the "Quisquam Est" item or to all three
	1597	items "Neque", "Porro", and "Quisquam Est". While not an ideal
	1598	situation, this is preferable to providing formatting cues that may
	1599	be actually contrary to the author's intent.
	1600
	1601	=back
	1602
	1603
	1604
	1605	=head1 About Data Paragraphs and "=begin/=end" Regions
	1606
	1607	Data paragraphs are typically used for inlining non-Pod data that is
	1608	to be used (typically passed through) when rendering the document to
	1609	a specific format:
	1610
	1611	=begin rtf
	1612
	1613	\par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
	1614
	1615	=end rtf
	1616
	1617	The exact same effect could, incidentally, be achieved with a single
	1618	"=for" paragraph:
	1619
	1620	=for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
	1621
	1622	(Although that is not formally a data paragraph, it has the same
	1623	meaning as one, and Pod parsers may parse it as one.)
	1624
	1625	Another example of a data paragraph:
	1626
	1627	=begin html
	1628
	1629	I like <em>PIE</em>!
	1630
	1631	<hr>Especially pecan pie!
	1632
	1633	=end html
	1634
	1635	If these were ordinary paragraphs, the Pod parser would try to
	1636	expand the "EE<lt>/em>" (in the first paragraph) as a formatting
	1637	code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this
	1638	is in a "=begin I<identifier>"..."=end I<identifier>" region I<and>
	1639	the identifier "html" doesn't begin have a ":" prefix, the contents
	1640	of this region are stored as data paragraphs, instead of being
	1641	processed as ordinary paragraphs (or if they began with a spaces
	1642	and/or tabs, as verbatim paragraphs).
	1643
	1644	As a further example: At time of writing, no "biblio" identifier is
	1645	supported, but suppose some processor were written to recognize it as
	1646	a way of (say) denoting a bibliographic reference (necessarily
	1647	containing formatting codes in ordinary paragraphs). The fact that
	1648	"biblio" paragraphs were meant for ordinary processing would be
	1649	indicated by prefacing each "biblio" identifier with a colon:
	1650
	1651	=begin :biblio
	1652
	1653	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1654	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1655
	1656	=end :biblio
	1657
	1658	This would signal to the parser that paragraphs in this begin...end
	1659	region are subject to normal handling as ordinary/verbatim paragraphs
	1660	(while still tagged as meant only for processors that understand the
	1661	"biblio" identifier). The same effect could be had with:
	1662
	1663	=for :biblio
	1664	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1665	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1666
	1667	The ":" on these identifiers means simply "process this stuff
	1668	normally, even though the result will be for some special target".
	1669	I suggest that parser APIs report "biblio" as the target identifier,
	1670	but also report that it had a ":" prefix. (And similarly, with the
	1671	above "html", report "html" as the target identifier, and note the
	1672	I<lack> of a ":" prefix.)
	1673
	1674	Note that a "=begin I<identifier>"..."=end I<identifier>" region where
	1675	I<identifier> begins with a colon, I<can> contain commands. For example:
	1676
	1677	=begin :biblio
	1678
	1679	Wirth's classic is available in several editions, including:
	1680
	1681	=for comment
	1682	hm, check abebooks.com for how much used copies cost.
	1683
	1684	=over
	1685
	1686	=item
	1687
	1688	Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
	1689	Teubner, Stuttgart. [Yes, it's in German.]
	1690
	1691	=item
	1692
	1693	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1694	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1695
	1696	=back
	1697
	1698	=end :biblio
	1699
	1700	Note, however, a "=begin I<identifier>"..."=end I<identifier>"
	1701	region where I<identifier> does I<not> begin with a colon, should not
	1702	directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back",
	1703	nor "=item". For example, this may be considered invalid:
	1704
	1705	=begin somedata
	1706
	1707	This is a data paragraph.
	1708
	1709	=head1 Don't do this!
	1710
	1711	This is a data paragraph too.
	1712
	1713	=end somedata
	1714
	1715	A Pod processor may signal that the above (specifically the "=head1"
	1716	paragraph) is an error. Note, however, that the following should
	1717	I<not> be treated as an error:
	1718
	1719	=begin somedata
	1720
	1721	This is a data paragraph.
	1722
	1723	=cut
	1724
	1725	# Yup, this isn't Pod anymore.
	1726	sub excl { (rand() > .5) ? "hoo!" : "hah!" }
	1727
	1728	=pod
	1729
	1730	This is a data paragraph too.
	1731
	1732	=end somedata
	1733
	1734	And this too is valid:
	1735
	1736	=begin someformat
	1737
	1738	This is a data paragraph.
	1739
	1740	And this is a data paragraph.
	1741
	1742	=begin someotherformat
	1743
	1744	This is a data paragraph too.
	1745
	1746	And this is a data paragraph too.
	1747
	1748	=begin :yetanotherformat
	1749
	1750	=head2 This is a command paragraph!
	1751
	1752	This is an ordinary paragraph!
	1753
	1754	And this is a verbatim paragraph!
	1755
	1756	=end :yetanotherformat
	1757
	1758	=end someotherformat
	1759
	1760	Another data paragraph!
	1761
	1762	=end someformat
	1763
	1764	The contents of the above "=begin :yetanotherformat" ...
	1765	"=end :yetanotherformat" region I<aren't> data paragraphs, because
	1766	the immediately containing region's identifier (":yetanotherformat")
	1767	begins with a colon. In practice, most regions that contain
	1768	data paragraphs will contain I<only> data paragraphs; however,
	1769	the above nesting is syntactically valid as Pod, even if it is
	1770	rare. However, the handlers for some formats, like "html",
	1771	will accept only data paragraphs, not nested regions; and they may
	1772	complain if they see (targeted for them) nested regions, or commands,
	1773	other than "=end", "=pod", and "=cut".
	1774
	1775	Also consider this valid structure:
	1776
	1777	=begin :biblio
	1778
	1779	Wirth's classic is available in several editions, including:
	1780
	1781	=over
	1782
	1783	=item
	1784
	1785	Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
	1786	Teubner, Stuttgart. [Yes, it's in German.]
	1787
	1788	=item
	1789
	1790	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1791	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1792
	1793	=back
	1794
	1795	Buy buy buy!
	1796
	1797	=begin html
	1798
	1799	<img src='wirth_spokesmodeling_book.png'>
	1800
	1801	<hr>
	1802
	1803	=end html
	1804
	1805	Now now now!
	1806
	1807	=end :biblio
	1808
	1809	There, the "=begin html"..."=end html" region is nested inside
	1810	the larger "=begin :biblio"..."=end :biblio" region. Note that the
	1811	content of the "=begin html"..."=end html" region is data
	1812	paragraph(s), because the immediately containing region's identifier
	1813	("html") I<doesn't> begin with a colon.
	1814
	1815	Pod parsers, when processing a series of data paragraphs one
	1816	after another (within a single region), should consider them to
	1817	be one large data paragraph that happens to contain blank lines. So
	1818	the content of the above "=begin html"..."=end html" I<may> be stored
	1819	as two data paragraphs (one consisting of
	1820	"<img src='wirth_spokesmodeling_book.png'>\n"
	1821	and another consisting of "<hr>\n"), but I<should> be stored as
	1822	a single data paragraph (consisting of
	1823	"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
	1824
	1825	Pod processors should tolerate empty
	1826	"=begin I<something>"..."=end I<something>" regions,
	1827	empty "=begin :I<something>"..."=end :I<something>" regions, and
	1828	contentless "=for I<something>" and "=for :I<something>"
	1829	paragraphs. I.e., these should be tolerated:
	1830
	1831	=for html
	1832
	1833	=begin html
	1834
	1835	=end html
	1836
	1837	=begin :biblio
	1838
	1839	=end :biblio
	1840
	1841	Incidentally, note that there's no easy way to express a data
	1842	paragraph starting with something that looks like a command. Consider:
	1843
	1844	=begin stuff
	1845
	1846	=shazbot
	1847
	1848	=end stuff
	1849
	1850	There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
	1851	paragraph "=shazbot\n". However, you can express a data paragraph consisting
	1852	of "=shazbot\n" using this code:
	1853
	1854	=for stuff =shazbot
	1855
	1856	The situation where this is necessary, is presumably quite rare.
	1857
	1858	Note that =end commands must match the currently open =begin command. That
	1859	is, they must properly nest. For example, this is valid:
	1860
	1861	=begin outer
	1862
	1863	X
	1864
	1865	=begin inner
	1866
	1867	Y
	1868
	1869	=end inner
	1870
	1871	Z
	1872
	1873	=end outer
	1874
	1875	while this is invalid:
	1876
	1877	=begin outer
	1878
	1879	X
	1880
	1881	=begin inner
	1882
	1883	Y
	1884
	1885	=end outer
	1886
	1887	Z
	1888
	1889	=end inner
	1890
	1891	This latter is improper because when the "=end outer" command is seen, the
	1892	currently open region has the formatname "inner", not "outer". (It just
	1893	happens that "outer" is the format name of a higher-up region.) This is
	1894	an error. Processors must by default report this as an error, and may halt
	1895	processing the document containing that error. A corollary of this is that
	1896	regions cannot "overlap". That is, the latter block above does not represent
	1897	a region called "outer" which contains X and Y, overlapping a region called
	1898	"inner" which contains Y and Z. But because it is invalid (as all
	1899	apparently overlapping regions would be), it doesn't represent that, or
	1900	anything at all.
	1901
	1902	Similarly, this is invalid:
	1903
	1904	=begin thing
	1905
	1906	=end hting
	1907
	1908	This is an error because the region is opened by "thing", and the "=end"
	1909	tries to close "hting" [sic].
	1910
	1911	This is also invalid:
	1912
	1913	=begin thing
	1914
	1915	=end
	1916
	1917	This is invalid because every "=end" command must have a formatname
	1918	parameter.
	1919
	1920	=head1 SEE ALSO
	1921
	1922	L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">,
	1923	L<podchecker>
	1924
	1925	=head1 AUTHOR
	1926
	1927	Sean M. Burke
	1928
	1929	=cut
	1930
	1931