perl5.git.perl.org Git - perl5.git/blame

Commit	Line	Data
49781f4a	1	=encoding utf8
8a93676d SB	2
	3	=head1 NAME
	4
	5	perlpodspec - Plain Old Documentation: format specification and notes
	6
	7	=head1 DESCRIPTION
	8
	9	This document is detailed notes on the Pod markup language. Most
	10	people will only have to read L<perlpod\|perlpod> to know how to write
	11	in Pod, but this document may answer some incidental questions to do
	12	with parsing and rendering Pod.
	13
	14	In this document, "must" / "must not", "should" /
	15	"should not", and "may" have their conventional (cf. RFC 2119)
	16	meanings: "X must do Y" means that if X doesn't do Y, it's against
	17	this specification, and should really be fixed. "X should do Y"
	18	means that it's recommended, but X may fail to do Y, if there's a
	19	good reason. "X may do Y" is merely a note that X can do Y at
	20	will (although it is up to the reader to detect any connotation of
	21	"and I think it would be I<nice> if X did Y" versus "it wouldn't
	22	really I<bother> me if X did Y").
	23
	24	Notably, when I say "the parser should do Y", the
	25	parser may fail to do Y, if the calling application explicitly
	26	requests that the parser I<not> do Y. I often phrase this as
	27	"the parser should, by default, do Y." This doesn't I<require>
	28	the parser to provide an option for turning off whatever
	29	feature Y is (like expanding tabs in verbatim paragraphs), although
	30	it implicates that such an option I<may> be provided.
	31
	32	=head1 Pod Definitions
	33
ac036724	34	Pod is embedded in files, typically Perl source files, although you
8a93676d SB	35	can write a file that's nothing but Pod.
	36
	37	A B<line> in a file consists of zero or more non-newline characters,
	38	terminated by either a newline or the end of the file.
	39
	40	A B<newline sequence> is usually a platform-dependent concept, but
	41	Pod parsers should understand it to mean any of CR (ASCII 13), LF
	42	(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
	43	addition to any other system-specific meaning. The first CR/CRLF/LF
	44	sequence in the file may be used as the basis for identifying the
	45	newline sequence for parsing the rest of the file.
	46
	47	A B<blank line> is a line consisting entirely of zero or more spaces
	48	(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file.
	49	A B<non-blank line> is a line containing one or more characters other
	50	than space or tab (and terminated by a newline or end-of-file).
	51
	52	(I<Note:> Many older Pod parsers did not accept a line consisting of
ac036724	53	spaces/tabs and then a newline as a blank line. The only lines they
8a93676d SB	54	considered blank were lines consisting of I<no characters at all>,
	55	terminated by a newline.)
	56
	57	B<Whitespace> is used in this document as a blanket term for spaces,
	58	tabs, and newline sequences. (By itself, this term usually refers
	59	to literal whitespace. That is, sequences of whitespace characters
	60	in Pod source, as opposed to "EE<lt>32>", which is a formatting
	61	code that I<denotes> a whitespace character.)
	62
	63	A B<Pod parser> is a module meant for parsing Pod (regardless of
	64	whether this involves calling callbacks or building a parse tree or
	65	directly formatting it). A B<Pod formatter> (or B<Pod translator>)
	66	is a module or program that converts Pod to some other format (HTML,
	67	plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a
	68	formatter or translator, or might be a program that does something
353c6505	69	else with the Pod (like counting words, scanning for index points,
8a93676d SB	70	etc.).
	71
	72	Pod content is contained in B<Pod blocks>. A Pod block starts with a
1bca558f	73	line that matches C<m/\A=[a-zA-Z]/>, and continues up to the next line
ac036724	74	that matches C<m/\A=cut/> or up to the end of the file if there is
8a93676d SB	75	no C<m/\A=cut/> line.
	76
	77	=for comment
	78	The current perlsyn says:
	79	[beginquote]
	80	Note that pod translators should look at only paragraphs beginning
	81	with a pod directive (it makes parsing easier), whereas the compiler
	82	actually knows to look for pod escapes even in the middle of a
	83	paragraph. This means that the following secret stuff will be ignored
	84	by both the compiler and the translators.
e13bc2af	85	$x=3;
8a93676d SB	86	=secret stuff
	87	warn "Neither POD nor CODE!?"
	88	=cut back
e13bc2af	89	print "got $x\n";
8a93676d SB	90	You probably shouldn't rely upon the warn() being podded out forever.
	91	Not all pod translators are well-behaved in this regard, and perhaps
	92	the compiler will become pickier.
	93	[endquote]
	94	I think that those paragraphs should just be removed; paragraph-based
	95	parsing seems to have been largely abandoned, because of the hassle
	96	with non-empty blank lines messing up what people meant by "paragraph".
	97	Even if the "it makes parsing easier" bit were especially true,
	98	it wouldn't be worth the confusion of having perl and pod2whatever
	99	actually disagree on what can constitute a Pod block.
	100
e1a97e07 KW	101	Note that a parser is not expected to distinguish between something that
	102	looks like pod, but is in a quoted string, such as a here document.
	103
8a93676d SB	104	Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph
	105	consists of non-blank lines of text, separated by one or more blank
	106	lines.
	107
	108	For purposes of Pod processing, there are four types of paragraphs in
	109	a Pod block:
	110
	111	=over
	112
	113	=item *
	114
	115	A command paragraph (also called a "directive"). The first line of
	116	this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are
	117	typically one line, as in:
	118
	119	=head1 NOTES
	120
	121	=item *
	122
	123	But they may span several (non-blank) lines:
	124
	125	=for comment
	126	Hm, I wonder what it would look like if
	127	you tried to write a BNF for Pod from this.
210b36aa	128
8a93676d SB	129	=head3 Dr. Strangelove, or: How I Learned to
	130	Stop Worrying and Love the Bomb
	131
	132	I<Some> command paragraphs allow formatting codes in their content
	133	(i.e., after the part that matches C<m/\A=[a-zA-Z]\S\s/>), as in:
	134
	135	=head1 Did You Remember to C<use strict;>?
	136
	137	In other words, the Pod processing handler for "head1" will apply the
	138	same processing to "Did You Remember to CE<lt>use strict;>?" that it
ac036724	139	would to an ordinary paragraph (i.e., formatting codes like
8a93676d SB	140	"CE<lt>...>") are parsed and presumably formatted appropriately, and
	141	whitespace in the form of literal spaces and/or tabs is not
	142	significant.
	143
	144	=item *
	145
	146	A B<verbatim paragraph>. The first line of this paragraph must be a
	147	literal space or tab, and this paragraph must not be inside a "=begin
	148	I<identifier>", ... "=end I<identifier>" sequence unless
	149	"I<identifier>" begins with a colon (":"). That is, if a paragraph
	150	starts with a literal space or tab, but I<is> inside a
	151	"=begin I<identifier>", ... "=end I<identifier>" region, then it's
	152	a data paragraph, unless "I<identifier>" begins with a colon.
	153
	154	Whitespace I<is> significant in verbatim paragraphs (although, in
	155	processing, tabs are probably expanded).
	156
	157	=item *
	158
	159	An B<ordinary paragraph>. A paragraph is an ordinary paragraph
	160	if its first line matches neither C<m/\A=[a-zA-Z]/> nor
	161	C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>",
	162	... "=end I<identifier>" sequence unless "I<identifier>" begins with
	163	a colon (":").
	164
	165	=item *
	166
	167	A B<data paragraph>. This is a paragraph that I<is> inside a "=begin
	168	I<identifier>" ... "=end I<identifier>" sequence where
	169	"I<identifier>" does I<not> begin with a literal colon (":"). In
	170	some sense, a data paragraph is not part of Pod at all (i.e.,
	171	effectively it's "out-of-band"), since it's not subject to most kinds
	172	of Pod parsing; but it is specified here, since Pod
	173	parsers need to be able to call an event for it, or store it in some
	174	form in a parse tree, or at least just parse I<around> it.
	175
	176	=back
	177
	178	For example: consider the following paragraphs:
	179
	180	# <- that's the 0th column
	181
	182	=head1 Foo
210b36aa	183
8a93676d	184	Stuff
210b36aa	185
8a93676d	186	$foo->bar
210b36aa	187
8a93676d SB	188	=cut
	189
	190	Here, "=head1 Foo" and "=cut" are command paragraphs because the first
	191	line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar"
	192	is a verbatim paragraph, because its first line starts with a literal
	193	whitespace character (and there's no "=begin"..."=end" region around).
	194
	195	The "=begin I<identifier>" ... "=end I<identifier>" commands stop
6fbdb1cc	196	paragraphs that they surround from being parsed as ordinary or verbatim
8a93676d SB	197	paragraphs, if I<identifier> doesn't begin with a colon. This
	198	is discussed in detail in the section
	199	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	200
	201	=head1 Pod Commands
	202
	203	This section is intended to supplement and clarify the discussion in
	204	L<perlpod/"Command Paragraph">. These are the currently recognized
	205	Pod commands:
	206
	207	=over
	208
ee511750	209	=item "=head1", "=head2", "=head3", "=head4", "=head5", "=head6"
8a93676d SB	210
	211	This command indicates that the text in the remainder of the paragraph
	212	is a heading. That text may contain formatting codes. Examples:
	213
	214	=head1 Object Attributes
210b36aa	215
8a93676d SB	216	=head3 What B<Not> to Do!
8a93676d SB	217
ee511750 S	218	Both C<=head5> and C<=head6> were added in 2020 and might not be
	219	supported on all Pod parsers. L<Pod::Simple> 3.41 was released on October
	220	2020 and supports both of these providing support for all
	221	L<Pod::Simple>-based Pod parsers.
	222
8a93676d SB	223	=item "=pod"
	224
	225	This command indicates that this paragraph begins a Pod block. (If we
	226	are already in the middle of a Pod block, this command has no effect at
	227	all.) If there is any text in this command paragraph after "=pod",
	228	it must be ignored. Examples:
	229
	230	=pod
210b36aa	231
8a93676d	232	This is a plain Pod paragraph.
210b36aa	233
8a93676d SB	234	=pod This text is ignored.
	235
	236	=item "=cut"
	237
	238	This command indicates that this line is the end of this previously
	239	started Pod block. If there is any text after "=cut" on the line, it must be
	240	ignored. Examples:
	241
	242	=cut
	243
	244	=cut The documentation ends here.
	245
	246	=cut
	247	# This is the first line of program text.
	248	sub foo { # This is the second.
	249
659cfd94	250	It is an error to try to I<start> a Pod block with a "=cut" command. In
8a93676d SB	251	that case, the Pod processor must halt parsing of the input file, and
	252	must by default emit a warning.
	253
	254	=item "=over"
	255
	256	This command indicates that this is the start of a list/indent
	257	region. If there is any text following the "=over", it must consist
	258	of only a nonzero positive numeral. The semantics of this numeral is
	259	explained in the L</"About =over...=back Regions"> section, further
	260	below. Formatting codes are not expanded. Examples:
	261
	262	=over 3
210b36aa	263
8a93676d	264	=over 3.5
210b36aa	265
8a93676d SB	266	=over
	267
	268	=item "=item"
	269
	270	This command indicates that an item in a list begins here. Formatting
	271	codes are processed. The semantics of the (optional) text in the
	272	remainder of this paragraph are
	273	explained in the L</"About =over...=back Regions"> section, further
	274	below. Examples:
	275
	276	=item
210b36aa	277
8a93676d	278	=item *
210b36aa	279
8a93676d	280	=item *
210b36aa	281
8a93676d	282	=item 14
210b36aa	283
8a93676d	284	=item 3.
210b36aa	285
8a93676d	286	=item C<< $thing->stuff(I<dodad>) >>
210b36aa	287
8a93676d SB	288	=item For transporting us beyond seas to be tried for pretended
8a93676d SB	289	offenses
210b36aa	290
8a93676d SB	291	=item He is at this time transporting large armies of foreign
	292	mercenaries to complete the works of death, desolation and
	293	tyranny, already begun with circumstances of cruelty and perfidy
	294	scarcely paralleled in the most barbarous ages, and totally
	295	unworthy the head of a civilized nation.
	296
	297	=item "=back"
	298
	299	This command indicates that this is the end of the region begun
	300	by the most recent "=over" command. It permits no text after the
	301	"=back" command.
	302
	303	=item "=begin formatname"
	304
93592fd5 RS	305	=item "=begin formatname parameter"
93592fd5 RS	306
8a93676d SB	307	This marks the following paragraphs (until the matching "=end
	308	formatname") as being for some special kind of processing. Unless
	309	"formatname" begins with a colon, the contained non-command
	310	paragraphs are data paragraphs. But if "formatname" I<does> begin
	311	with a colon, then non-command paragraphs are ordinary paragraphs
	312	or data paragraphs. This is discussed in detail in the section
	313	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	314
	315	It is advised that formatnames match the regexp
c85e9b4c	316	C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the
93592fd5 RS	317	formatname is a parameter that may be used by the formatter when dealing
	318	with this region. This parameter must not be repeated in the "=end"
	319	paragraph. Implementors should anticipate future expansion in the
	320	semantics and syntax of the first parameter to "=begin"/"=end"/"=for".
8a93676d SB	321
	322	=item "=end formatname"
	323
	324	This marks the end of the region opened by the matching
	325	"=begin formatname" region. If "formatname" is not the formatname
	326	of the most recent open "=begin formatname" region, then this
	327	is an error, and must generate an error message. This
	328	is discussed in detail in the section
	329	L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
	330
	331	=item "=for formatname text..."
	332
	333	This is synonymous with:
	334
	335	=begin formatname
210b36aa	336
8a93676d	337	text...
210b36aa	338
8a93676d SB	339	=end formatname
	340
	341	That is, it creates a region consisting of a single paragraph; that
	342	paragraph is to be treated as a normal paragraph if "formatname"
	343	begins with a ":"; if "formatname" I<doesn't> begin with a colon,
	344	then "text..." will constitute a data paragraph. There is no way
	345	to use "=for formatname text..." to express "text..." as a verbatim
	346	paragraph.
	347
a179871b SB	348	=item "=encoding encodingname"
	349
	350	This command, which should occur early in the document (at least
1e54db1a	351	before any non-US-ASCII data!), declares that this document is
a179871b	352	encoded in the encoding I<encodingname>, which must be
6fbdb1cc	353	an encoding name that L<Encode> recognizes. (Encode's list
8a3f7e95	354	of supported encodings, in L<Encode::Supported>, is useful here.)
a179871b SB	355	If the Pod parser cannot decode the declared encoding, it
	356	should emit a warning and may abort parsing the document
	357	altogether.
	358
	359	A document having more than one "=encoding" line should be
	360	considered an error. Pod processors may silently tolerate this if
	361	the not-first "=encoding" lines are just duplicates of the
6fbdb1cc RS	362	first one (e.g., if there's a "=encoding utf8" line, and later on
6fbdb1cc RS	363	another "=encoding utf8" line). But Pod processors should complain if
a179871b SB	364	there are contradictory "=encoding" lines in the same document
	365	(e.g., if there is a "=encoding utf8" early in the document and
	366	"=encoding big5" later). Pod processors that recognize BOMs
	367	may also complain if they see an "=encoding" line
1e54db1a JH	368	that contradicts the BOM (e.g., if a document with a UTF-16LE
1e54db1a JH	369	BOM has an "=encoding shiftjis" line).
a179871b	370
8a93676d SB	371	=back
	372
	373	If a Pod processor sees any command other than the ones listed
	374	above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish",
	375	or "=w123"), that processor must by default treat this as an
	376	error. It must not process the paragraph beginning with that
	377	command, must by default warn of this as an error, and may
	378	abort the parse. A Pod parser may allow a way for particular
	379	applications to add to the above list of known commands, and to
	380	stipulate, for each additional command, whether formatting
	381	codes should be processed.
	382
	383	Future versions of this specification may add additional
	384	commands.
	385
	386
	387
	388	=head1 Pod Formatting Codes
	389
	390	(Note that in previous drafts of this document and of perlpod,
	391	formatting codes were referred to as "interior sequences", and
	392	this term may still be found in the documentation for Pod parsers,
	393	and in error messages from Pod processors.)
	394
	395	There are two syntaxes for formatting codes:
	396
	397	=over
	398
	399	=item *
	400
	401	A formatting code starts with a capital letter (just US-ASCII [A-Z])
	402	followed by a "<", any number of characters, and ending with the first
	403	matching ">". Examples:
	404
	405	That's what I<you> think!
	406
d8ff3e95	407	What's C<CORE::dump()> for?
8a93676d SB	408
	409	X<C<chmod> and C<unlink()> Under Different Operating Systems>
	410
	411	=item *
	412
	413	A formatting code starts with a capital letter (just US-ASCII [A-Z])
	414	followed by two or more "<"'s, one or more whitespace characters,
	415	any number of characters, one or more whitespace characters,
	416	and ending with the first matching sequence of two or more ">"'s, where
	417	the number of ">"'s equals the number of "<"'s in the opening of this
	418	formatting code. Examples:
	419
	420	That's what I<< you >> think!
	421
	422	C<<< open(X, ">>thing.dat") \|\| die $! >>>
	423
	424	B<< $foo->bar(); >>
	425
	426	With this syntax, the whitespace character(s) after the "CE<lt><<"
1bca558f	427	and before the ">>>" (or whatever letter) are I<not> renderable. They
8a93676d SB	428	do not signify whitespace, are merely part of the formatting codes
	429	themselves. That is, these are all synonymous:
	430
	431	C<thing>
	432	C<< thing >>
	433	C<< thing >>
	434	C<<< thing >>>
	435	C<<<<
	436	thing
	437	>>>>
	438
	439	and so on.
	440
a3d78747 RS	441	Finally, the multiple-angle-bracket form does I<not> alter the interpretation
	442	of nested formatting codes, meaning that the following four example lines are
	443	identical in meaning:
	444
	445	B<example: C<$a E<lt>=E<gt> $b>>
	446
	447	B<example: C<< $a <=> $b >>>
	448
	449	B<example: C<< $a E<lt>=E<gt> $b >>>
	450
	451	B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
	452
8a93676d SB	453	=back
	454
	455	In parsing Pod, a notably tricky part is the correct parsing of
	456	(potentially nested!) formatting codes. Implementors should
	457	consult the code in the C<parse_text> routine in Pod::Parser as an
	458	example of a correct implementation.
	459
	460	=over
	461
	462	=item C<IE<lt>textE<gt>> -- italic text
	463
	464	See the brief discussion in L<perlpod/"Formatting Codes">.
	465
	466	=item C<BE<lt>textE<gt>> -- bold text
	467
	468	See the brief discussion in L<perlpod/"Formatting Codes">.
	469
	470	=item C<CE<lt>codeE<gt>> -- code text
	471
	472	See the brief discussion in L<perlpod/"Formatting Codes">.
	473
	474	=item C<FE<lt>filenameE<gt>> -- style for filenames
	475
	476	See the brief discussion in L<perlpod/"Formatting Codes">.
	477
	478	=item C<XE<lt>topic nameE<gt>> -- an index entry
	479
	480	See the brief discussion in L<perlpod/"Formatting Codes">.
	481
	482	This code is unusual in that most formatters completely discard
	483	this code and its content. Other formatters will render it with
	484	invisible codes that can be used in building an index of
	485	the current document.
	486
	487	=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code
	488
	489	Discussed briefly in L<perlpod/"Formatting Codes">.
	490
c195f169	491	This code is unusual in that it should have no content. That is,
8a93676d SB	492	a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether
	493	or not it complains, the I<potatoes> text should ignored.
	494
	495	=item C<LE<lt>nameE<gt>> -- a hyperlink
	496
	497	The complicated syntaxes of this code are discussed at length in
	498	L<perlpod/"Formatting Codes">, and implementation details are
	499	discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the
	500	contents of LE<lt>content> is tricky. Notably, the content has to be
	501	checked for whether it looks like a URL, or whether it has to be split
	502	on literal "\|" and/or "/" (in the right order!), and so on,
	503	I<before> EE<lt>...> codes are resolved.
	504
	505	=item C<EE<lt>escapeE<gt>> -- a character escape
	506
	507	See L<perlpod/"Formatting Codes">, and several points in
	508	L</Notes on Implementing Pod Processors>.
	509
	510	=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces
	511
	512	This formatting code is syntactically simple, but semantically
	513	complex. What it means is that each space in the printable
3e666715	514	content of this code signifies a non-breaking space.
8a93676d SB	515
	516	Consider:
	517
	518	C<$x ? $y : $z>
	519
	520	S<C<$x ? $y : $z>>
	521
	522	Both signify the monospace (c[ode] style) text consisting of
	523	"$x", one space, "?", one space, ":", one space, "$z". The
	524	difference is that in the latter, with the S code, those spaces
3e666715	525	are not "normal" spaces, but instead are non-breaking spaces.
8a93676d SB	526
	527	=back
	528
	529
	530	If a Pod processor sees any formatting code other than the ones
	531	listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that
	532	processor must by default treat this as an error.
	533	A Pod parser may allow a way for particular
	534	applications to add to the above list of known formatting codes;
	535	a Pod parser might even allow a way to stipulate, for each additional
	536	command, whether it requires some form of special processing, as
	537	LE<lt>...> does.
	538
	539	Future versions of this specification may add additional
	540	formatting codes.
	541
	542	Historical note: A few older Pod processors would not see a ">" as
	543	closing a "CE<lt>" code, if the ">" was immediately preceded by
	544	a "-". This was so that this:
	545
	546	C<$foo->bar>
	547
	548	would parse as equivalent to this:
	549
75f15e9f	550	C<$foo-E<gt>bar>
8a93676d SB	551
	552	instead of as equivalent to a "C" formatting code containing
	553	only "$foo-", and then a "bar>" outside the "C" formatting code. This
	554	problem has since been solved by the addition of syntaxes like this:
	555
	556	C<< $foo->bar >>
	557
	558	Compliant parsers must not treat "->" as special.
	559
	560	Formatting codes absolutely cannot span paragraphs. If a code is
	561	opened in one paragraph, and no closing code is found by the end of
	562	that paragraph, the Pod parser must close that formatting code,
	563	and should complain (as in "Unterminated I code in the paragraph
	564	starting at line 123: 'Time objects are not...'"). So these
	565	two paragraphs:
	566
	567	I<I told you not to do this!
210b36aa	568
8a93676d SB	569	Don't make me say it again!>
	570
	571	...must I<not> be parsed as two paragraphs in italics (with the I
	572	code starting in one paragraph and starting in another.) Instead,
	573	the first paragraph should generate a warning, but that aside, the
	574	above code must parse as if it were:
	575
	576	I<I told you not to do this!>
210b36aa	577
8a93676d SB	578	Don't make me say it again!E<gt>
	579
	580	(In SGMLish jargon, all Pod commands are like block-level
	581	elements, whereas all Pod formatting codes are like inline-level
	582	elements.)
	583
	584
	585
	586	=head1 Notes on Implementing Pod Processors
	587
	588	The following is a long section of miscellaneous requirements
	589	and suggestions to do with Pod processing.
	590
	591	=over
	592
	593	=item *
	594
	595	Pod formatters should tolerate lines in verbatim blocks that are of
	596	any length, even if that means having to break them (possibly several
	597	times, for very long lines) to avoid text running off the side of the
	598	page. Pod formatters may warn of such line-breaking. Such warnings
	599	are particularly appropriate for lines are over 100 characters long, which
	600	are usually not intentional.
	601
	602	=item *
	603
	604	Pod parsers must recognize I<all> of the three well-known newline
	605	formats: CR, LF, and CRLF. See L<perlport\|perlport>.
	606
	607	=item *
	608
	609	Pod parsers should accept input lines that are of any length.
	610
	611	=item *
	612
	613	Since Perl recognizes a Unicode Byte Order Mark at the start of files
	614	as signaling that the file is Unicode encoded as in UTF-16 (whether
	615	big-endian or little-endian) or UTF-8, Pod parsers should do the
	616	same. Otherwise, the character encoding should be understood as
	617	being UTF-8 if the first highbit byte sequence in the file seems
8f226aee DW	618	valid as a UTF-8 sequence, or otherwise as CP-1252 (earlier versions of
8f226aee DW	619	this specification used Latin-1 instead of CP-1252).
8a93676d SB	620
	621	Future versions of this specification may specify
	622	how Pod can accept other encodings. Presumably treatment of other
	623	encodings in Pod parsing would be as in XML parsing: whatever the
	624	encoding declared by a particular Pod file, content is to be
	625	stored in memory as Unicode characters.
	626
	627	=item *
	628
	629	The well known Unicode Byte Order Marks are as follows: if the
	630	file begins with the two literal byte values 0xFE 0xFF, this is
	631	the BOM for big-endian UTF-16. If the file begins with the two
	632	literal byte value 0xFF 0xFE, this is the BOM for little-endian
df0c7995 KW	633	UTF-16. On an ASCII platform, if the file begins with the three literal
df0c7995 KW	634	byte values
8a93676d	635	0xEF 0xBB 0xBF, this is the BOM for UTF-8.
e8a0e562	636	A mechanism portable to EBCDIC platforms is to:
df0c7995 KW	637
	638	my $utf8_bom = "\x{FEFF}";
	639	utf8::encode($utf8_bom);
8a93676d SB	640
	641	=for comment
	642	use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}";
	643	0xEF 0xBB 0xBF
	644
	645	=for comment
1e54db1a	646	If toke.c is modified to support UTF-32, add mention of those here.
8a93676d SB	647
	648	=item *
	649
df0c7995 KW	650	A naive, but often sufficient heuristic on ASCII platforms, for testing
df0c7995 KW	651	the first highbit
8a93676d SB	652	byte-sequence in a BOM-less file (whether in code or in Pod!), to see
8a93676d SB	653	whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
9a5b9407	654	that the first byte in the sequence is in the range 0xC2 - 0xFD
8a93676d SB	655	I<and> whether the next byte is in the range
	656	0x80 - 0xBF. If so, the parser may conclude that this file is in
	657	UTF-8, and all highbit sequences in the file should be assumed to
	658	be UTF-8. Otherwise the parser should treat the file as being
df0c7995 KW	659	in CP-1252. (A better check, and which works on EBCDIC platforms as
df0c7995 KW	660	well, is to pass a copy of the sequence to
9a5b9407 KW	661	L<utf8::decode()\|utf8> which performs a full validity check on the
	662	sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This
	663	function is always pre-loaded, is fast because it is written in C, and
	664	will only get called at most once, so you don't need to avoid it out of
	665	performance concerns.)
	666	In the unlikely circumstance that the first highbit
8a93676d SB	667	sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one
	668	can cater to our heuristic (as well as any more intelligent heuristic)
	669	by prefacing that line with a comment line containing a highbit
	670	sequence that is clearly I<not> valid as UTF-8. A line consisting
	671	of simply "#", an e-acute, and any non-highbit byte,
	672	is sufficient to establish this file's encoding.
	673
	674	=for comment
	675	If/WHEN some brave soul makes these heuristics into a generic
fae2c0fb	676	text-file class (or PerlIO layer?), we can presumably delete
8a93676d	677	mention of these icky details from this file, and can instead
fae2c0fb	678	tell people to just use appropriate class/layer.
8a93676d	679	Auto-recognition of newline sequences would be another desirable
fae2c0fb	680	feature of such a class/layer.
8a93676d SB	681	HINT HINT HINT.
	682
	683	=for comment
	684	"The probability that a string of characters
	685	in any other encoding appears as valid UTF-8 is low" - RFC2279
	686
	687	=item *
	688
8a93676d SB	689	Pod processors must treat a "=for [label] [content...]" paragraph as
	690	meaning the same thing as a "=begin [label]" paragraph, content, and
	691	an "=end [label]" paragraph. (The parser may conflate these two
	692	constructs, or may leave them distinct, in the expectation that the
	693	formatter will nevertheless treat them the same.)
	694
	695	=item *
	696
	697	When rendering Pod to a format that allows comments (i.e., to nearly
	698	any format other than plaintext), a Pod formatter must insert comment
	699	text identifying its name and version number, and the name and
	700	version numbers of any modules it might be using to process the Pod.
	701	Minimal examples:
	702
555bd962	703	%% POD::Pod2PS v3.14159, using POD::Parser v1.92
210b36aa	704
555bd962	705	<!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
210b36aa	706
555bd962	707	{\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
210b36aa	708
555bd962	709	.\" Pod::Man version 3.14159, using POD::Parser version 1.92
8a93676d SB	710
	711	Formatters may also insert additional comments, including: the
	712	release date of the Pod formatter program, the contact address for
	713	the author(s) of the formatter, the current time, the name of input
	714	file, the formatting options in effect, version of Perl used, etc.
	715
	716	Formatters may also choose to note errors/warnings as comments,
	717	besides or instead of emitting them otherwise (as in messages to
	718	STDERR, or C<die>ing).
	719
	720	=item *
	721
	722	Pod parsers I<may> emit warnings or error messages ("Unknown E code
	723	EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or
	724	C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow
	725	suppressing all such STDERR output, and instead allow an option for
	726	reporting errors/warnings
	727	in some other way, whether by triggering a callback, or noting errors
	728	in some attribute of the document object, or some similarly unobtrusive
	729	mechanism -- or even by appending a "Pod Errors" section to the end of
	730	the parsed form of the document.
	731
	732	=item *
	733
	734	In cases of exceptionally aberrant documents, Pod parsers may abort the
	735	parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where
	736	possible, the parser library may simply close the input file
	737	and add text like "* Formatting Aborted *" to the end of the
	738	(partial) in-memory document.
	739
	740	=item *
	741
	742	In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>)
	743	are understood (i.e., I<not> verbatim paragraphs, but I<including>
	744	ordinary paragraphs, and command paragraphs that produce renderable
	745	text, like "=head1"), literal whitespace should generally be considered
	746	"insignificant", in that one literal space has the same meaning as any
	747	(nonzero) number of literal spaces, literal newlines, and literal tabs
	748	(as long as this produces no blank lines, since those would terminate
	749	the paragraph). Pod parsers should compact literal whitespace in each
	750	processed paragraph, but may provide an option for overriding this
	751	(since some processing tasks do not require it), or may follow
	752	additional special rules (for example, specially treating
	753	period-space-space or period-newline sequences).
	754
	755	=item *
	756
	757	Pod parsers should not, by default, try to coerce apostrophe (') and
	758	quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
	759	turn backtick (`) into anything else but a single backtick character
353c6505	760	(distinct from an open quote character!), nor "--" into anything but
8a93676d SB	761	two minus signs. They I<must never> do any of those things to text
	762	in CE<lt>...> formatting codes, and never I<ever> to text in verbatim
	763	paragraphs.
	764
	765	=item *
	766
	767	When rendering Pod to a format that has two kinds of hyphens (-), one
3e666715	768	that's a non-breaking hyphen, and another that's a breakable hyphen
8a93676d SB	769	(as in "object-oriented", which can be split across lines as
8a93676d SB	770	"object-", newline, "oriented"), formatters are encouraged to
3e666715	771	generally translate "-" to non-breaking hyphen, but may apply
8a93676d SB	772	heuristics to convert some of these to breaking hyphens.
	773
	774	=item *
	775
	776	Pod formatters should make reasonable efforts to keep words of Perl
	777	code from being broken across lines. For example, "Foo::Bar" in some
	778	formatting systems is seen as eligible for being broken across lines
	779	as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should
	780	be avoided where possible, either by disabling all line-breaking in
	781	mid-word, or by wrapping particular words with internal punctuation
	782	in "don't break this across lines" codes (which in some formats may
	783	not be a single code, but might be a matter of inserting non-breaking
	784	zero-width spaces between every pair of characters in a word.)
	785
	786	=item *
	787
	788	Pod parsers should, by default, expand tabs in verbatim paragraphs as
	789	they are processed, before passing them to the formatter or other
	790	processor. Parsers may also allow an option for overriding this.
	791
	792	=item *
	793
	794	Pod parsers should, by default, remove newlines from the end of
	795	ordinary and verbatim paragraphs before passing them to the
	796	formatter. For example, while the paragraph you're reading now
	797	could be considered, in Pod source, to end with (and contain)
	798	the newline(s) that end it, it should be processed as ending with
	799	(and containing) the period character that ends this sentence.
	800
	801	=item *
	802
	803	Pod parsers, when reporting errors, should make some effort to report
	804	an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near
	805	line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
	806	number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where
	807	this is problematic, the paragraph number should at least be
	808	accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in
	809	Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
	810	the CE<lt>interest rate> attribute...'").
	811
	812	=item *
	813
	814	Pod parsers, when processing a series of verbatim paragraphs one
	815	after another, should consider them to be one large verbatim
	816	paragraph that happens to contain blank lines. I.e., these two
d1be9408	817	lines, which have a blank line between them:
8a93676d SB	818
	819	use Foo;
	820
	821	print Foo->VERSION
	822
	823	should be unified into one paragraph ("\tuse Foo;\n\n\tprint
	824	Foo->VERSION") before being passed to the formatter or other
	825	processor. Parsers may also allow an option for overriding this.
	826
	827	While this might be too cumbersome to implement in event-based Pod
	828	parsers, it is straightforward for parsers that return parse trees.
	829
	830	=item *
	831
	832	Pod formatters, where feasible, are advised to avoid splitting short
	833	verbatim paragraphs (under twelve lines, say) across pages.
	834
	835	=item *
	836
	837	Pod parsers must treat a line with only spaces and/or tabs on it as a
	838	"blank line" such as separates paragraphs. (Some older parsers
	839	recognized only two adjacent newlines as a "blank line" but would not
	840	recognize a newline, a space, and a newline, as a blank line. This
	841	is noncompliant behavior.)
	842
	843	=item *
	844
	845	Authors of Pod formatters/processors should make every effort to
	846	avoid writing their own Pod parser. There are already several in
	847	CPAN, with a wide range of interface styles -- and one of them,
33874d2e	848	Pod::Simple, comes with modern versions of Perl.
8a93676d SB	849
	850	=item *
	851
	852	Characters in Pod documents may be conveyed either as literals, or by
	853	number in EE<lt>n> codes, or by an equivalent mnemonic, as in
bd940430 KW	854	EE<lt>eacute> which is exactly equivalent to EE<lt>233>. The numbers
	855	are the Latin1/Unicode values, even on EBCDIC platforms.
	856
	857	When referring to characters by using a EE<lt>n> numeric code, numbers
	858	in the range 32-126 refer to those well known US-ASCII characters (also
	859	defined there by Unicode, with the same meaning), which all Pod
df0c7995 KW	860	formatters must render faithfully. Characters whose EE<lt>E<gt> numbers
	861	are in the ranges 0-31 and 127-159 should not be used (neither as
	862	literals,
	863	nor as EE<lt>number> codes), except for the literal byte-sequences for
	864	newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9).
bd940430 KW	865
	866	Numbers in the range 160-255 refer to Latin-1 characters (also
	867	defined there by Unicode, with the same meaning). Numbers above
8a93676d SB	868	255 should be understood to refer to Unicode characters.
	869
	870	=item *
	871
	872	Be warned
	873	that some formatters cannot reliably render characters outside 32-126;
	874	and many are able to handle 32-126 and 160-255, but nothing above
	875	255.
	876
	877	=item *
	878
	879	Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for
	880	less-than and greater-than, Pod parsers must understand "EE<lt>sol>"
	881	for "/" (solidus, slash), and "EE<lt>verbar>" for "\|" (vertical bar,
	882	pipe). Pod parsers should also understand "EE<lt>lchevron>" and
	883	"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e.,
	884	"left-pointing double angle quotation mark" = "left pointing
	885	guillemet" and "right-pointing double angle quotation mark" = "right
	886	pointing guillemet". (These look like little "<<" and ">>", and they
	887	are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>"
	888	and "EE<lt>raquo>".)
	889
	890	=item *
	891
	892	Pod parsers should understand all "EE<lt>html>" codes as defined
	893	in the entity declarations in the most recent XHTML specification at
	894	C<www.W3.org>. Pod parsers must understand at least the entities
	895	that define characters in the range 160-255 (Latin-1). Pod parsers,
	896	when faced with some unknown "EE<lt>I<identifier>>" code,
	897	shouldn't simply replace it with nullstring (by default, at least),
	898	but may pass it through as a string consisting of the literal characters
	899	E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the
	900	alternative option of processing such unknown
	901	"EE<lt>I<identifier>>" codes by firing an event especially
	902	for such codes, or by adding a special node-type to the in-memory
	903	document tree. Such "EE<lt>I<identifier>>" may have special meaning
	904	to some processors, or some processors may choose to add them to
	905	a special error report.
	906
	907	=item *
	908
	909	Pod parsers must also support the XHTML codes "EE<lt>quot>" for
	910	character 34 (doublequote, "), "EE<lt>amp>" for character 38
	911	(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, ').
	912
	913	=item *
	914
1bca558f	915	Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether
8a93676d	916	an htmlname, or a number in any base) must consist only of
817141f8	917	alphanumeric characters -- that is, I<whatever> must match
1bca558f	918	C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because
8a93676d SB	919	it contains spaces, which aren't alphanumeric characters. This
8a93676d SB	920	presumably does not I<need> special treatment by a Pod processor;
1bca558f	921	S<" 0 1 2 3 "> doesn't look like a number in any base, so it would
8a93676d	922	presumably be looked up in the table of HTML-like names. Since
1bca558f	923	there isn't (and cannot be) an HTML-like entity called S<" 0 1 2 3 ">,
8a93676d	924	this will be treated as an error. However, Pod processors may
1bca558f	925	treat S<"EE<lt> 0 1 2 3 E<gt>"> or "EE<lt>e-acute>" as I<syntactically>
8a93676d SB	926	invalid, potentially earning a different error message than the
	927	error message (or warning, or event) generated by a merely unknown
	928	(but theoretically valid) htmlname, as in "EE<lt>qacute>"
	929	[sic]. However, Pod parsers are not required to make this
	930	distinction.
	931
	932	=item *
	933
	934	Note that EE<lt>number> I<must not> be interpreted as simply
	935	"codepoint I<number> in the current/native character set". It always
	936	means only "the character represented by codepoint I<number> in
	937	Unicode." (This is identical to the semantics of &#I<number>; in XML.)
	938
	939	This will likely require many formatters to have tables mapping from
	940	treatable Unicode codepoints (such as the "\xE9" for the e-acute
	941	character) to the escape sequences or codes necessary for conveying
	942	such sequences in the target output format. A converter to *roff
	943	would, for example know that "\xE9" (whether conveyed literally, or via
	944	a EE<lt>...> sequence) is to be conveyed as "e\\*'".
8939ba94	945	Similarly, a program rendering Pod in a Mac OS application window, would
8a93676d	946	presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
8939ba94	947	encoding that (at time of writing) is native for Mac OS. Such
8a93676d SB	948	Unicode2whatever mappings are presumably already widely available for
	949	common output formats. (Such mappings may be incomplete! Implementers
	950	are not expected to bend over backwards in an attempt to render
	951	Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
	952	of the other weird things that Unicode can encode.) And
	953	if a Pod document uses a character not found in such a mapping, the
	954	formatter should consider it an unrenderable character.
	955
	956	=item *
	957
	958	If, surprisingly, the implementor of a Pod formatter can't find a
	959	satisfactory pre-existing table mapping from Unicode characters to
	960	escapes in the target format (e.g., a decent table of Unicode
	961	characters to *roff escapes), it will be necessary to build such a
	962	table. If you are in this circumstance, you should begin with the
	963	characters in the range 0x00A0 - 0x00FF, which is mostly the heavily
	964	used accented characters. Then proceed (as patience permits and
	965	fastidiousness compels) through the characters that the (X)HTML
	966	standards groups judged important enough to merit mnemonics
	967	for. These are declared in the (X)HTML specifications at the
	968	www.W3.org site. At time of writing (September 2001), the most recent
	969	entity declaration files are:
	970
	971	http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
	972	http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
	973	http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
	974
	975	Then you can progress through any remaining notable Unicode characters
	976	in the range 0x2000-0x204D (consult the character tables at
	977	www.unicode.org), and whatever else strikes your fancy. For example,
	978	in F<xhtml-symbol.ent>, there is the entry:
	979
	980	<!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech -->
	981
	982	While the mapping "infin" to the character "\x{221E}" will (hopefully)
	983	have been already handled by the Pod parser, the presence of the
	984	character in this file means that it's reasonably important enough to
	985	include in a formatter's table that maps from notable Unicode characters
	986	to the codes necessary for rendering them. So for a Unicode-to-*roff
	987	mapping, for example, this would merit the entry:
	988
	989	"\x{221E}" => '\(in',
	990
	991	It is eagerly hoped that in the future, increasing numbers of formats
	992	(and formatters) will support Unicode characters directly (as (X)HTML
	993	does with C<∞>, C<∞>, or C<∞>), reducing the need
	994	for idiosyncratic mappings of Unicode-to-I<my_escapes>.
	995
	996	=item *
	997
353c6505	998	It is up to individual Pod formatter to display good judgement when
8a93676d SB	999	confronted with an unrenderable character (which is distinct from an
	1000	unknown EE<lt>thing> sequence that the parser couldn't resolve to
	1001	anything, renderable or not). It is good practice to map Latin letters
	1002	with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding
	1003	unaccented US-ASCII letters (like a simple character 101, "e"), but
210b36aa	1004	clearly this is often not feasible, and an unrenderable character may
8a93676d SB	1005	be represented as "?", or the like. In attempting a sane fallback
	1006	(as from EE<lt>233> to "e"), Pod formatters may use the
	1007	%Latin1Code_to_fallback table in L<Pod::Escapes\|Pod::Escapes>, or
	1008	L<Text::Unidecode\|Text::Unidecode>, if available.
	1009
	1010	For example, this Pod text:
	1011
	1012	magic is enabled if you set C<$Currency> to 'E<euro>'.
	1013
	1014	may be rendered as:
	1015	"magic is enabled if you set C<$Currency> to 'I<?>'" or as
	1016	"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as
	1017	"magic is enabled if you set C<$Currency> to '[x20AC]', etc.
	1018
	1019	A Pod formatter may also note, in a comment or warning, a list of what
	1020	unrenderable characters were encountered.
	1021
	1022	=item *
	1023
	1024	EE<lt>...> may freely appear in any formatting code (other than
	1025	in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The
	1026	EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The
	1027	EE<lt>euro>1,000,000 Solution\|Million::Euros>".
	1028
	1029	=item *
	1030
3e666715	1031	Some Pod formatters output to formats that implement non-breaking
8a93676d	1032	spaces as an individual character (which I'll call "NBSP"), and
3e666715	1033	others output to formats that implement non-breaking spaces just as
8a93676d SB	1034	spaces wrapped in a "don't break this across lines" code. Note that
	1035	at the level of Pod, both sorts of codes can occur: Pod can contain a
	1036	NBSP character (whether as a literal, or as a "EE<lt>160>" or
	1037	"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
	1038	IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
3e666715	1039	such codes are taken to represent non-breaking spaces. Pod
8a93676d SB	1040	parsers should consider supporting the optional parsing of "SE<lt>foo
	1041	IE<lt>barE<gt> baz>" as if it were
	1042	"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
	1043	optional parsing of groups of words joined by NBSP's as if each group
	1044	were in a SE<lt>...> code, so that formatters may use the
	1045	representation that maps best to what the output format demands.
	1046
	1047	=item *
	1048
210b36aa	1049	Some processors may find that the C<SE<lt>...E<gt>> code is easiest to
8a93676d SB	1050	implement by replacing each space in the parse tree under the content
	1051	of the S, with an NBSP. But note: the replacement should apply I<not> to
	1052	spaces in I<all> text, but I<only> to spaces in I<printable> text. (This
	1053	distinction may or may not be evident in the particular tree/event
	1054	model implemented by the Pod parser.) For example, consider this
	1055	unusual case:
	1056
	1057	S<L</Autoloaded Functions>>
	1058
	1059	This means that the space in the middle of the visible link text must
	1060	not be broken across lines. In other words, it's the same as this:
	1061
	1062	L<"AutoloadedE<160>Functions"/Autoloaded Functions>
	1063
	1064	However, a misapplied space-to-NBSP replacement could (wrongly)
	1065	produce something equivalent to this:
	1066
	1067	L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
	1068
	1069	...which is almost definitely not going to work as a hyperlink (assuming
	1070	this formatter outputs a format supporting hypertext).
	1071
	1072	Formatters may choose to just not support the S format code,
	1073	especially in cases where the output format simply has no NBSP
	1074	character/code and no code for "don't break this stuff across lines".
	1075
	1076	=item *
	1077
	1078	Besides the NBSP character discussed above, implementors are reminded
	1079	of the existence of the other "special" character in Latin-1, the
210b36aa	1080	"soft hyphen" character, also known as "discretionary hyphen",
8a93676d SB	1081	i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> =
	1082	C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation
	1083	point. That is, it normally renders as nothing, but may render as a
	1084	"-" if a formatter breaks the word at that point. Pod formatters
	1085	should, as appropriate, do one of the following: 1) render this with
	1086	a code with the same meaning (e.g., "\-" in RTF), 2) pass it through
	1087	in the expectation that the formatter understands this character as
	1088	such, or 3) delete it.
	1089
	1090	For example:
	1091
	1092	sigE<shy>action
	1093	manuE<shy>script
	1094	JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
	1095
	1096	These signal to a formatter that if it is to hyphenate "sigaction"
	1097	or "manuscript", then it should be done as
	1098	"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script"
	1099	(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't
	1100	show up at all). And if it is
	1101	to hyphenate "Jarkko" and/or "Hietaniemi", it can do
	1102	so only at the points where there is a C<EE<lt>shyE<gt>> code.
	1103
	1104	In practice, it is anticipated that this character will not be used
	1105	often, but formatters should either support it, or delete it.
	1106
	1107	=item *
	1108
	1109	If you think that you want to add a new command to Pod (like, say, a
	1110	"=biblio" command), consider whether you could get the same
	1111	effect with a for or begin/end sequence: "=for biblio ..." or "=begin
	1112	biblio" ... "=end biblio". Pod processors that don't understand
	1113	"=for biblio", etc, will simply ignore it, whereas they may complain
	1114	loudly if they see "=biblio".
	1115
	1116	=item *
	1117
	1118	Throughout this document, "Pod" has been the preferred spelling for
	1119	the name of the documentation format. One may also use "POD" or
da75cd15	1120	"pod". For the documentation that is (typically) in the Pod
8a93676d SB	1121	format, you may use "pod", or "Pod", or "POD". Understanding these
	1122	distinctions is useful; but obsessing over how to spell them, usually
	1123	is not.
	1124
	1125	=back
	1126
	1127
	1128
	1129
	1130
	1131	=head1 About LE<lt>...E<gt> Codes
	1132
	1133	As you can tell from a glance at L<perlpod\|perlpod>, the LE<lt>...>
	1134	code is the most complex of the Pod formatting codes. The points below
	1135	will hopefully clarify what it means and how processors should deal
	1136	with it.
	1137
	1138	=over
	1139
	1140	=item *
	1141
	1142	In parsing an LE<lt>...> code, Pod parsers must distinguish at least
	1143	four attributes:
	1144
	1145	=over
	1146
	1147	=item First:
	1148
1bca558f	1149	The link-text. If there is none, this must be C<undef>. (E.g., in
8a93676d SB	1150	"LE<lt>Perl Functions\|perlfunc>", the link-text is "Perl Functions".
	1151	In "LE<lt>Time::HiRes>" and even "LE<lt>\|Time::HiRes>", there is no
	1152	link text. Note that link text may contain formatting.)
	1153
	1154	=item Second:
	1155
ac036724	1156	The possibly inferred link-text; i.e., if there was no real link
8a93676d SB	1157	text, then this is the text that we'll infer in its place. (E.g., for
	1158	"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".)
	1159
	1160	=item Third:
	1161
1bca558f	1162	The name or URL, or C<undef> if none. (E.g., in "LE<lt>Perl
ac036724	1163	Functions\|perlfunc>", the name (also sometimes called the page)
1bca558f	1164	is "perlfunc". In "LE<lt>/CAVEATS>", the name is C<undef>.)
8a93676d SB	1165
	1166	=item Fourth:
	1167
1bca558f	1168	The section (AKA "item" in older perlpods), or C<undef> if none. E.g.,
f41e638c	1169	in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note
8a93676d SB	1170	that this is not the same as a manpage section like the "5" in "man 5
8a93676d SB	1171	crontab". "Section Foo" in the Pod sense means the part of the text
6edf2346	1172	that's introduced by the heading or item whose text is "Foo".)
8a93676d SB	1173
	1174	=back
	1175
	1176	Pod parsers may also note additional attributes including:
	1177
	1178	=over
	1179
	1180	=item Fifth:
	1181
	1182	A flag for whether item 3 (if present) is a URL (like
	1183	"http://lists.perl.org" is), in which case there should be no section
	1184	attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or
	1185	possibly a man page name (like "crontab(5)" is).
	1186
	1187	=item Sixth:
	1188
	1189	The raw original LE<lt>...> content, before text is split on
	1190	"\|", "/", etc, and before EE<lt>...> codes are expanded.
	1191
	1192	=back
	1193
	1194	(The above were numbered only for concise reference below. It is not
	1195	a requirement that these be passed as an actual list or array.)
	1196
	1197	For example:
	1198
	1199	L<Foo::Bar>
555bd962 BG	1200	=> undef, # link text
	1201	"Foo::Bar", # possibly inferred link text
	1202	"Foo::Bar", # name
	1203	undef, # section
	1204	'pod', # what sort of link
	1205	"Foo::Bar" # original content
8a93676d SB	1206
8a93676d SB	1207	L<Perlport's section on NL's\|perlport/Newlines>
555bd962 BG	1208	=> "Perlport's section on NL's", # link text
	1209	"Perlport's section on NL's", # possibly inferred link text
	1210	"perlport", # name
	1211	"Newlines", # section
	1212	'pod', # what sort of link
	1213	"Perlport's section on NL's\|perlport/Newlines"
	1214	# original content
8a93676d SB	1215
8a93676d SB	1216	L<perlport/Newlines>
555bd962 BG	1217	=> undef, # link text
	1218	'"Newlines" in perlport', # possibly inferred link text
	1219	"perlport", # name
	1220	"Newlines", # section
	1221	'pod', # what sort of link
	1222	"perlport/Newlines" # original content
8a93676d SB	1223
8a93676d SB	1224	L<crontab(5)/"DESCRIPTION">
555bd962 BG	1225	=> undef, # link text
	1226	'"DESCRIPTION" in crontab(5)', # possibly inferred link text
	1227	"crontab(5)", # name
	1228	"DESCRIPTION", # section
	1229	'man', # what sort of link
	1230	'crontab(5)/"DESCRIPTION"' # original content
8a93676d SB	1231
8a93676d SB	1232	L</Object Attributes>
555bd962 BG	1233	=> undef, # link text
	1234	'"Object Attributes"', # possibly inferred link text
	1235	undef, # name
	1236	"Object Attributes", # section
	1237	'pod', # what sort of link
	1238	"/Object Attributes" # original content
8a93676d	1239
71c89d21	1240	L<https://www.perl.org/>
555bd962	1241	=> undef, # link text
a7b1b289 MM	1242	"https://www.perl.org/", # possibly inferred link text
a7b1b289 MM	1243	"https://www.perl.org/", # name
555bd962 BG	1244	undef, # section
555bd962 BG	1245	'url', # what sort of link
71c89d21	1246	"https://www.perl.org/" # original content
8a93676d	1247
71c89d21	1248	L<Perl.org\|https://www.perl.org/>
555bd962	1249	=> "Perl.org", # link text
a7b1b289 MM	1250	"https://www.perl.org/", # possibly inferred link text
a7b1b289 MM	1251	"https://www.perl.org/", # name
555bd962 BG	1252	undef, # section
555bd962 BG	1253	'url', # what sort of link
71c89d21	1254	"Perl.org\|https://www.perl.org/" # original content
f6e963e4	1255
8a93676d SB	1256	Note that you can distinguish URL-links from anything else by the
	1257	fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So
	1258	C<LE<lt>http://www.perl.comE<gt>> is a URL, but
	1259	C<LE<lt>HTTP::ResponseE<gt>> isn't.
	1260
	1261	=item *
	1262
	1263	In case of LE<lt>...> codes with no "text\|" part in them,
	1264	older formatters have exhibited great variation in actually displaying
	1265	the link or cross reference. For example, LE<lt>crontab(5)> would render
	1266	as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage"
	1267	or just "C<crontab(5)>".
	1268
	1269	Pod processors must now treat "text\|"-less links as follows:
	1270
	1271	L<name> => L<name\|name>
	1272	L</section> => L<"section"\|/section>
	1273	L<name/section> => L<"section" in name\|name/section>
	1274
	1275	=item *
	1276
	1277	Note that section names might contain markup. I.e., if a section
	1278	starts with:
	1279
	1280	=head2 About the C<-M> Operator
	1281
	1282	or with:
	1283
	1284	=item About the C<-M> Operator
	1285
	1286	then a link to it would look like this:
	1287
	1288	L<somedoc/About the C<-M> Operator>
	1289
	1290	Formatters may choose to ignore the markup for purposes of resolving
	1291	the link and use only the renderable characters in the section name,
	1292	as in:
	1293
	1294	<h1><a name="About_the_-M_Operator">About the <code>-M</code>
	1295	Operator</h1>
210b36aa	1296
8a93676d	1297	...
210b36aa	1298
8a93676d SB	1299	<a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
	1300	Operator" in somedoc</a>
	1301
	1302	=item *
	1303
	1304	Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>>
	1305	links from C<LE<lt>name/itemE<gt>> links (and their targets). These
	1306	have been merged syntactically and semantically in the current
	1307	specification, and I<section> can refer either to a "=headI<n> Heading
	1308	Content" command or to a "=item Item Content" command. This
	1309	specification does not specify what behavior should be in the case
	1310	of a given document having several things all seeming to produce the
	1311	same I<section> identifier (e.g., in HTML, several things all producing
	1312	the same I<anchorname> in <a name="I<anchorname>">...</a>
	1313	elements). Where Pod processors can control this behavior, they should
	1314	use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the
	1315	I<first> "Bar" section in Foo.
	1316
	1317	But for some processors/formats this cannot be easily controlled; as
	1318	with the HTML example, the behavior of multiple ambiguous
	1319	<a name="I<anchorname>">...</a> is most easily just left up to
	1320	browsers to decide.
	1321
	1322	=item *
	1323
8a93676d SB	1324	In a C<LE<lt>text\|...E<gt>> code, text may contain formatting codes
	1325	for formatting or for EE<lt>...> escapes, as in:
	1326
	1327	L<B<ummE<234>stuff>\|...>
	1328
	1329	For C<LE<lt>...E<gt>> codes without a "name\|" part, only
ac036724	1330	C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is,
ac036724	1331	authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>".
8a93676d SB	1332
	1333	Note, however, that formatting codes and ZE<lt>>'s can occur in any
	1334	and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>,
	1335	and I<url>).
	1336
	1337	Authors must not nest LE<lt>...> codes. For example, "LE<lt>The
	1338	LE<lt>Foo::Bar> man page>" should be treated as an error.
	1339
	1340	=item *
	1341
	1342	Note that Pod authors may use formatting codes inside the "text"
	1343	part of "LE<lt>text\|name>" (and so on for LE<lt>text\|/"sec">).
	1344
	1345	In other words, this is valid:
	1346
	1347	Go read L<the docs on C<$.>\|perlvar/"$.">
	1348
	1349	Some output formats that do allow rendering "LE<lt>...>" codes as
	1350	hypertext, might not allow the link-text to be formatted; in
	1351	that case, formatters will have to just ignore that formatting.
	1352
	1353	=item *
	1354
	1355	At time of writing, C<LE<lt>nameE<gt>> values are of two types:
	1356	either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which
	1357	might be a real Perl module or program in an @INC / PATH
e1020413	1358	directory, or a .pod file in those places); or the name of a Unix
8a93676d	1359	man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>>
62a78fcb	1360	is ambiguous between a Pod page called "chmod", or the Unix man page
8a93676d SB	1361	"chmod" (in whatever man-section). However, the presence of a string
	1362	in parens, as in "crontab(5)", is sufficient to signal that what
	1363	is being discussed is not a Pod page, and so is presumably a
e1020413	1364	Unix man page. The distinction is of no importance to many
8a93676d SB	1365	Pod processors, but some processors that render to hypertext formats
	1366	may need to distinguish them in order to know how to render a
	1367	given C<LE<lt>fooE<gt>> code.
	1368
	1369	=item *
	1370
b41aadf2 RS	1371	Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in
	1372	C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from
	1373	C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only
	1374	slightly less ambiguous. This syntax is no longer in the specification, and
	1375	has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was
	1376	formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>>
	1377	syntax, for a while at least. The suggested heuristic for distinguishing
	1378	C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any
	1379	whitespace, it's a I<section>. Pod processors should warn about this being
	1380	deprecated syntax.
8a93676d SB	1381
	1382	=back
	1383
	1384	=head1 About =over...=back Regions
	1385
	1386	"=over"..."=back" regions are used for various kinds of list-like
	1387	structures. (I use the term "region" here simply as a collective
	1388	term for everything from the "=over" to the matching "=back".)
	1389
	1390	=over
	1391
	1392	=item *
	1393
	1394	The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ...
	1395	"=back" is used for giving the formatter a clue as to how many
	1396	"spaces" (ems, or roughly equivalent units) it should tab over,
	1397	although many formatters will have to convert this to an absolute
	1398	measurement that may not exactly match with the size of spaces (or M's)
	1399	in the document's base font. Other formatters may have to completely
	1400	ignore the number. The lack of any explicit I<indentlevel> parameter is
	1401	equivalent to an I<indentlevel> value of 4. Pod processors may
	1402	complain if I<indentlevel> is present but is not a positive number
	1403	matching C<m/\A(\d*\.)?\d+\z/>.
	1404
	1405	=item *
	1406
	1407	Authors of Pod formatters are reminded that "=over" ... "=back" may
	1408	map to several different constructs in your output format. For
	1409	example, in converting Pod to (X)HTML, it can map to any of
	1410	<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
	1411	<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or
	1412	<dt>.
	1413
	1414	=item *
	1415
	1416	Each "=over" ... "=back" region should be one of the following:
	1417
	1418	=over
	1419
	1420	=item *
	1421
	1422	An "=over" ... "=back" region containing only "=item *" commands,
	1423	each followed by some number of ordinary/verbatim paragraphs, other
	1424	nested "=over" ... "=back" regions, "=for..." paragraphs, and
	1425	"=begin"..."=end" regions.
	1426
	1427	(Pod processors must tolerate a bare "=item" as if it were "=item
	1428	".) Whether "" is rendered as a literal asterisk, an "o", or as
	1429	some kind of real bullet character, is left up to the Pod formatter,
	1430	and may depend on the level of nesting.
	1431
	1432	=item *
	1433
	1434	An "=over" ... "=back" region containing only
	1435	C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them)
	1436	followed by some number of ordinary/verbatim paragraphs, other nested
	1437	"=over" ... "=back" regions, "=for..." paragraphs, and/or
	1438	"=begin"..."=end" codes. Note that the numbers must start at 1
	1439	in each section, and must proceed in order and without skipping
	1440	numbers.
	1441
	1442	(Pod processors must tolerate lines like "=item 1" as if they were
	1443	"=item 1.", with the period.)
	1444
1445	=item *
1446
1447	An "=over" ... "=back" region containing only "=item [text]"
1448	commands, each one (or each group of them) followed by some number of
1449	ordinary/verbatim paragraphs, other nested "=over" ... "=back"
1450	regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
1451
1452	The "=item [text]" paragraph should not match
1453	C<m/\A=item\s+\d+\.?\s\z/> or C<m/\A=item\s+\\s*\z/>, nor should it
1454	match just C<m/\A=item\s*\z/>.
1455
1456	=item *
1457
1458	An "=over" ... "=back" region containing no "=item" paragraphs at
1459	all, and containing only some number of
1460	ordinary/verbatim paragraphs, and possibly also some nested "=over"
1461	... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
1462	regions. Such an itemless "=over" ... "=back" region in Pod is
1463	equivalent in meaning to a "<blockquote>...</blockquote>" element in
1464	HTML.
1465
1466	=back
1467
1468	Note that with all the above cases, you can determine which type of
1469	"=over" ... "=back" you have, by examining the first (non-"=cut",
1470	non-"=pod") Pod paragraph after the "=over" command.
1471
1472	=item *
1473
1474	Pod formatters I<must> tolerate arbitrarily large amounts of text
1475	in the "=item I<text...>" paragraph. In practice, most such
1476	paragraphs are short, as in:
1477
1478	=item For cutting off our trade with all parts of the world
1479
1480	But they may be arbitrarily long:
1481
1482	=item For transporting us beyond seas to be tried for pretended
1483	offenses
1484
1485	=item He is at this time transporting large armies of foreign
1486	mercenaries to complete the works of death, desolation and
1487	tyranny, already begun with circumstances of cruelty and perfidy
1488	scarcely paralleled in the most barbarous ages, and totally
1489	unworthy the head of a civilized nation.
1490
1491	=item *
1492
1493	Pod processors should tolerate "=item *" / "=item I<number>" commands
1494	with no accompanying paragraph. The middle item is an example:
1495
1496	=over
210b36aa	1497
8a93676d	1498	=item 1
210b36aa	1499
8a93676d	1500	Pick up dry cleaning.
210b36aa	1501
8a93676d	1502	=item 2
210b36aa	1503
8a93676d	1504	=item 3
210b36aa	1505
8a93676d	1506	Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
210b36aa	1507
8a93676d SB	1508	=back
	1509
	1510	=item *
	1511
	1512	No "=over" ... "=back" region can contain headings. Processors may
	1513	treat such a heading as an error.
	1514
	1515	=item *
	1516
	1517	Note that an "=over" ... "=back" region should have some
	1518	content. That is, authors should not have an empty region like this:
	1519
	1520	=over
210b36aa	1521
8a93676d SB	1522	=back
	1523
	1524	Pod processors seeing such a contentless "=over" ... "=back" region,
	1525	may ignore it, or may report it as an error.
	1526
	1527	=item *
	1528
	1529	Processors must tolerate an "=over" list that goes off the end of the
	1530	document (i.e., which has no matching "=back"), but they may warn
	1531	about such a list.
	1532
	1533	=item *
	1534
	1535	Authors of Pod formatters should note that this construct:
	1536
	1537	=item Neque
	1538
	1539	=item Porro
	1540
	1541	=item Quisquam Est
210b36aa	1542
8a93676d SB	1543	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1544	velit, sed quia non numquam eius modi tempora incidunt ut
	1545	labore et dolore magnam aliquam quaerat voluptatem.
	1546
	1547	=item Ut Enim
	1548
	1549	is semantically ambiguous, in a way that makes formatting decisions
	1550	a bit difficult. On the one hand, it could be mention of an item
	1551	"Neque", mention of another item "Porro", and mention of another
	1552	item "Quisquam Est", with just the last one requiring the explanatory
	1553	paragraph "Qui dolorem ipsum quia dolor..."; and then an item
	1554	"Ut Enim". In that case, you'd want to format it like so:
	1555
	1556	Neque
210b36aa	1557
8a93676d	1558	Porro
210b36aa	1559
8a93676d SB	1560	Quisquam Est
	1561	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1562	velit, sed quia non numquam eius modi tempora incidunt ut
	1563	labore et dolore magnam aliquam quaerat voluptatem.
	1564
	1565	Ut Enim
	1566
	1567	But it could equally well be a discussion of three (related or equivalent)
	1568	items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph
	1569	explaining them all, and then a new item "Ut Enim". In that case, you'd
	1570	probably want to format it like so:
	1571
	1572	Neque
	1573	Porro
	1574	Quisquam Est
	1575	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1576	velit, sed quia non numquam eius modi tempora incidunt ut
	1577	labore et dolore magnam aliquam quaerat voluptatem.
	1578
	1579	Ut Enim
	1580
353c6505	1581	But (for the foreseeable future), Pod does not provide any way for Pod
8a93676d SB	1582	authors to distinguish which grouping is meant by the above
	1583	"=item"-cluster structure. So formatters should format it like so:
	1584
	1585	Neque
	1586
	1587	Porro
	1588
	1589	Quisquam Est
	1590
	1591	Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
	1592	velit, sed quia non numquam eius modi tempora incidunt ut
	1593	labore et dolore magnam aliquam quaerat voluptatem.
	1594
	1595	Ut Enim
	1596
210b36aa	1597	That is, there should be (at least roughly) equal spacing between
8a93676d SB	1598	items as between paragraphs (although that spacing may well be less
	1599	than the full height of a line of text). This leaves it to the reader
	1600	to use (con)textual cues to figure out whether the "Qui dolorem
	1601	ipsum..." paragraph applies to the "Quisquam Est" item or to all three
	1602	items "Neque", "Porro", and "Quisquam Est". While not an ideal
	1603	situation, this is preferable to providing formatting cues that may
	1604	be actually contrary to the author's intent.
	1605
	1606	=back
	1607
	1608
	1609
	1610	=head1 About Data Paragraphs and "=begin/=end" Regions
	1611
	1612	Data paragraphs are typically used for inlining non-Pod data that is
	1613	to be used (typically passed through) when rendering the document to
	1614	a specific format:
	1615
	1616	=begin rtf
210b36aa	1617
8a93676d	1618	\par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
210b36aa	1619
8a93676d SB	1620	=end rtf
	1621
	1622	The exact same effect could, incidentally, be achieved with a single
	1623	"=for" paragraph:
	1624
	1625	=for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
	1626
	1627	(Although that is not formally a data paragraph, it has the same
	1628	meaning as one, and Pod parsers may parse it as one.)
	1629
	1630	Another example of a data paragraph:
	1631
	1632	=begin html
210b36aa	1633
8a93676d	1634	I like <em>PIE</em>!
210b36aa	1635
8a93676d	1636	<hr>Especially pecan pie!
210b36aa	1637
8a93676d SB	1638	=end html
	1639
	1640	If these were ordinary paragraphs, the Pod parser would try to
	1641	expand the "EE<lt>/em>" (in the first paragraph) as a formatting
	1642	code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this
	1643	is in a "=begin I<identifier>"..."=end I<identifier>" region I<and>
	1644	the identifier "html" doesn't begin have a ":" prefix, the contents
	1645	of this region are stored as data paragraphs, instead of being
	1646	processed as ordinary paragraphs (or if they began with a spaces
	1647	and/or tabs, as verbatim paragraphs).
	1648
	1649	As a further example: At time of writing, no "biblio" identifier is
	1650	supported, but suppose some processor were written to recognize it as
	1651	a way of (say) denoting a bibliographic reference (necessarily
	1652	containing formatting codes in ordinary paragraphs). The fact that
	1653	"biblio" paragraphs were meant for ordinary processing would be
	1654	indicated by prefacing each "biblio" identifier with a colon:
	1655
	1656	=begin :biblio
	1657
	1658	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1659	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1660
	1661	=end :biblio
	1662
	1663	This would signal to the parser that paragraphs in this begin...end
	1664	region are subject to normal handling as ordinary/verbatim paragraphs
	1665	(while still tagged as meant only for processors that understand the
	1666	"biblio" identifier). The same effect could be had with:
	1667
	1668	=for :biblio
	1669	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1670	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1671
	1672	The ":" on these identifiers means simply "process this stuff
	1673	normally, even though the result will be for some special target".
	1674	I suggest that parser APIs report "biblio" as the target identifier,
	1675	but also report that it had a ":" prefix. (And similarly, with the
	1676	above "html", report "html" as the target identifier, and note the
	1677	I<lack> of a ":" prefix.)
	1678
	1679	Note that a "=begin I<identifier>"..."=end I<identifier>" region where
	1680	I<identifier> begins with a colon, I<can> contain commands. For example:
	1681
	1682	=begin :biblio
210b36aa	1683
8a93676d	1684	Wirth's classic is available in several editions, including:
210b36aa	1685
8a93676d SB	1686	=for comment
8a93676d SB	1687	hm, check abebooks.com for how much used copies cost.
210b36aa	1688
8a93676d	1689	=over
210b36aa	1690
8a93676d	1691	=item
210b36aa	1692
8a93676d SB	1693	Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
8a93676d SB	1694	Teubner, Stuttgart. [Yes, it's in German.]
210b36aa	1695
8a93676d	1696	=item
210b36aa	1697
8a93676d SB	1698	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
8a93676d SB	1699	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
210b36aa	1700
8a93676d	1701	=back
210b36aa	1702
8a93676d SB	1703	=end :biblio
	1704
	1705	Note, however, a "=begin I<identifier>"..."=end I<identifier>"
	1706	region where I<identifier> does I<not> begin with a colon, should not
	1707	directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back",
	1708	nor "=item". For example, this may be considered invalid:
	1709
	1710	=begin somedata
210b36aa	1711
8a93676d	1712	This is a data paragraph.
210b36aa	1713
8a93676d	1714	=head1 Don't do this!
210b36aa	1715
8a93676d	1716	This is a data paragraph too.
210b36aa	1717
8a93676d SB	1718	=end somedata
	1719
	1720	A Pod processor may signal that the above (specifically the "=head1"
	1721	paragraph) is an error. Note, however, that the following should
	1722	I<not> be treated as an error:
	1723
	1724	=begin somedata
210b36aa	1725
8a93676d	1726	This is a data paragraph.
210b36aa	1727
8a93676d	1728	=cut
210b36aa	1729
8a93676d SB	1730	# Yup, this isn't Pod anymore.
8a93676d SB	1731	sub excl { (rand() > .5) ? "hoo!" : "hah!" }
210b36aa	1732
8a93676d	1733	=pod
210b36aa	1734
8a93676d	1735	This is a data paragraph too.
210b36aa	1736
8a93676d SB	1737	=end somedata
	1738
	1739	And this too is valid:
	1740
	1741	=begin someformat
210b36aa	1742
8a93676d	1743	This is a data paragraph.
210b36aa	1744
8a93676d	1745	And this is a data paragraph.
210b36aa	1746
8a93676d	1747	=begin someotherformat
210b36aa	1748
8a93676d	1749	This is a data paragraph too.
210b36aa	1750
8a93676d	1751	And this is a data paragraph too.
210b36aa	1752
8a93676d SB	1753	=begin :yetanotherformat
	1754
	1755	=head2 This is a command paragraph!
	1756
	1757	This is an ordinary paragraph!
210b36aa	1758
8a93676d	1759	And this is a verbatim paragraph!
210b36aa	1760
8a93676d	1761	=end :yetanotherformat
210b36aa	1762
8a93676d	1763	=end someotherformat
210b36aa	1764
8a93676d	1765	Another data paragraph!
210b36aa	1766
8a93676d SB	1767	=end someformat
	1768
	1769	The contents of the above "=begin :yetanotherformat" ...
	1770	"=end :yetanotherformat" region I<aren't> data paragraphs, because
	1771	the immediately containing region's identifier (":yetanotherformat")
	1772	begins with a colon. In practice, most regions that contain
	1773	data paragraphs will contain I<only> data paragraphs; however,
	1774	the above nesting is syntactically valid as Pod, even if it is
	1775	rare. However, the handlers for some formats, like "html",
	1776	will accept only data paragraphs, not nested regions; and they may
	1777	complain if they see (targeted for them) nested regions, or commands,
	1778	other than "=end", "=pod", and "=cut".
	1779
	1780	Also consider this valid structure:
	1781
	1782	=begin :biblio
210b36aa	1783
8a93676d	1784	Wirth's classic is available in several editions, including:
210b36aa	1785
8a93676d	1786	=over
210b36aa	1787
8a93676d	1788	=item
210b36aa	1789
8a93676d SB	1790	Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
8a93676d SB	1791	Teubner, Stuttgart. [Yes, it's in German.]
210b36aa	1792
8a93676d	1793	=item
210b36aa	1794
8a93676d SB	1795	Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
	1796	Programs.> Prentice-Hall, Englewood Cliffs, NJ.
	1797
	1798	=back
210b36aa	1799
8a93676d	1800	Buy buy buy!
210b36aa	1801
8a93676d	1802	=begin html
210b36aa	1803
8a93676d	1804	<img src='wirth_spokesmodeling_book.png'>
210b36aa	1805
8a93676d	1806	<hr>
210b36aa	1807
8a93676d	1808	=end html
210b36aa	1809
8a93676d	1810	Now now now!
210b36aa	1811
8a93676d SB	1812	=end :biblio
	1813
	1814	There, the "=begin html"..."=end html" region is nested inside
	1815	the larger "=begin :biblio"..."=end :biblio" region. Note that the
	1816	content of the "=begin html"..."=end html" region is data
	1817	paragraph(s), because the immediately containing region's identifier
	1818	("html") I<doesn't> begin with a colon.
	1819
	1820	Pod parsers, when processing a series of data paragraphs one
	1821	after another (within a single region), should consider them to
	1822	be one large data paragraph that happens to contain blank lines. So
	1823	the content of the above "=begin html"..."=end html" I<may> be stored
	1824	as two data paragraphs (one consisting of
	1825	"<img src='wirth_spokesmodeling_book.png'>\n"
	1826	and another consisting of "<hr>\n"), but I<should> be stored as
	1827	a single data paragraph (consisting of
	1828	"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
	1829
	1830	Pod processors should tolerate empty
	1831	"=begin I<something>"..."=end I<something>" regions,
	1832	empty "=begin :I<something>"..."=end :I<something>" regions, and
	1833	contentless "=for I<something>" and "=for :I<something>"
	1834	paragraphs. I.e., these should be tolerated:
	1835
	1836	=for html
210b36aa	1837
8a93676d	1838	=begin html
210b36aa	1839
8a93676d	1840	=end html
210b36aa	1841
8a93676d	1842	=begin :biblio
210b36aa	1843
8a93676d SB	1844	=end :biblio
	1845
	1846	Incidentally, note that there's no easy way to express a data
	1847	paragraph starting with something that looks like a command. Consider:
	1848
	1849	=begin stuff
210b36aa	1850
8a93676d	1851	=shazbot
210b36aa	1852
8a93676d SB	1853	=end stuff
	1854
	1855	There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
	1856	paragraph "=shazbot\n". However, you can express a data paragraph consisting
	1857	of "=shazbot\n" using this code:
	1858
	1859	=for stuff =shazbot
	1860
	1861	The situation where this is necessary, is presumably quite rare.
	1862
	1863	Note that =end commands must match the currently open =begin command. That
	1864	is, they must properly nest. For example, this is valid:
	1865
	1866	=begin outer
210b36aa	1867
8a93676d	1868	X
210b36aa	1869
8a93676d	1870	=begin inner
210b36aa	1871
8a93676d	1872	Y
210b36aa	1873
8a93676d	1874	=end inner
210b36aa	1875
8a93676d	1876	Z
210b36aa	1877
8a93676d SB	1878	=end outer
	1879
	1880	while this is invalid:
	1881
	1882	=begin outer
210b36aa	1883
8a93676d	1884	X
210b36aa	1885
8a93676d	1886	=begin inner
210b36aa	1887
8a93676d	1888	Y
210b36aa	1889
8a93676d	1890	=end outer
210b36aa	1891
8a93676d	1892	Z
210b36aa	1893
8a93676d	1894	=end inner
210b36aa	1895
8a93676d SB	1896	This latter is improper because when the "=end outer" command is seen, the
	1897	currently open region has the formatname "inner", not "outer". (It just
	1898	happens that "outer" is the format name of a higher-up region.) This is
	1899	an error. Processors must by default report this as an error, and may halt
210b36aa	1900	processing the document containing that error. A corollary of this is that
ac036724	1901	regions cannot "overlap". That is, the latter block above does not represent
8a93676d SB	1902	a region called "outer" which contains X and Y, overlapping a region called
	1903	"inner" which contains Y and Z. But because it is invalid (as all
	1904	apparently overlapping regions would be), it doesn't represent that, or
	1905	anything at all.
	1906
	1907	Similarly, this is invalid:
	1908
	1909	=begin thing
210b36aa	1910
8a93676d SB	1911	=end hting
	1912
	1913	This is an error because the region is opened by "thing", and the "=end"
	1914	tries to close "hting" [sic].
	1915
	1916	This is also invalid:
	1917
	1918	=begin thing
210b36aa	1919
8a93676d SB	1920	=end
	1921
	1922	This is invalid because every "=end" command must have a formatname
	1923	parameter.
	1924
	1925	=head1 SEE ALSO
	1926
	1927	L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">,
	1928	L<podchecker>
	1929
	1930	=head1 AUTHOR
	1931
	1932	Sean M. Burke
	1933
	1934	=cut
	1935
	1936