perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	#!/usr/bin/perl -w
	2
	3	BEGIN {
	4	chdir 't';
	5	unshift @INC, "../lib";
	6	}
	7
	8	use strict;
	9	use warnings;
	10	use feature 'unicode_strings';
	11
	12	use Carp;
	13	use Config;
	14	use Digest;
	15	use File::Find;
	16	use File::Spec;
	17	use Scalar::Util;
	18	use Text::Tabs;
	19
	20	BEGIN {
	21	require '../regen/regen_lib.pl';
	22	}
	23
	24	sub DEBUG { 0 };
	25
	26	=pod
	27
	28	=head1 NAME
	29
	30	podcheck.t - Look for possible problems in the Perl pods
	31
	32	=head1 SYNOPSIS
	33
	34	cd t
	35	./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--deltas]
	36	[--counts] [ FILE ...]
	37	./perl -I../lib porting/podcheck.t --add_link MODULE ...
	38
	39	./perl -I../lib porting/podcheck.t --regen
	40
	41	=head1 DESCRIPTION
	42
	43	podcheck.t is an extension of Pod::Checker. It looks for pod errors and
	44	potential errors in the files given as arguments, or if none specified, in all
	45	pods in the distribution workspace, except certain known special ones
	46	(specified below). It does additional checking beyond that done by
	47	Pod::Checker, and keeps a database of known potential problems, and will
	48	fail a pod only if the number of such problems differs from that given in the
	49	database. It also suppresses the C<(section) deprecated> message from
	50	Pod::Checker, since specifying the man page section number is quite proper to do.
	51
	52	The additional checks it makes are:
	53
	54	=over
	55
	56	=item Cross-pod link checking
	57
	58	Pod::Checker verifies that links to an internal target in a pod are not
	59	broken. podcheck.t extends that (when called without FILE arguments) to
	60	external links. It does this by gathering up all the possible targets in the
	61	workspace, and cross-checking them. It also checks that a non-broken link
	62	points to just one target. (The destination pod could have two targets with
	63	the same name.)
	64
	65	The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod)
	66	is to actually create a link to C<search.cpan.org> with an embedded query for
	67	the desired pod or man page. That means that links outside the distribution
	68	are valid. podcheck.t doesn't verify the validity of such links, but instead
	69	keeps a data base of those known to be valid. This means that if a link to a
	70	target not on the list is created, the target needs to be added to the data
	71	base. This is accomplished via the L<--add_link\|/--add_link MODULE ...>
	72	option to podcheck.t, described below.
	73
	74	=item An internal link that isn't so specified
	75
	76	If a link is broken, but there is an existing internal target of the same
	77	name, it is likely that the internal target was meant, and the C<"/"> is
	78	missing from the C<LE<lt>E<gt>> pod command.
	79
	80	=item Verbatim paragraphs that wrap in an 80 (including 1 spare) column window
	81
	82	It's annoying to have lines wrap when displaying pod documentation in a
	83	terminal window. This checks that all verbatim lines fit in a standard 80
	84	column window, even when using a pager that reserves a column for its own use.
	85	(Thus the check is for a net of 79 columns.)
	86	For those lines that don't fit, it tells you how much needs to be cut in
	87	order to fit.
	88
	89	Often, the easiest thing to do to gain space for these is to lower the indent
	90	to just one space.
	91
	92	=item Missing or duplicate NAME or missing NAME short description
	93
	94	A pod can't be linked to unless it has a unique name.
	95	And a NAME should have a dash and short description after it.
	96
	97	=item =encoding statement issues
	98
	99	This indicates if an C<=encoding> statement should be present, or moved to the
	100	front of the pod.
	101
	102	=item Items that perhaps should be links
	103
	104	There are mentions of apparent files in the pods that perhaps should be links
	105	instead, using C<LE<lt>...E<gt>>
	106
	107	=item Items that perhaps should be C<FE<lt>...E<gt>>
	108
	109	What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
	110	C<FE<lt>...E<gt>> mark-up instead.
	111
	112	=back
	113
	114	A number of issues raised by podcheck.t and by the base Pod::Checker are not
	115	really problems, but merely potential problems, that is, false positives.
	116	After inspecting them and
	117	deciding that they aren't real problems, it is possible to shut up this program
	118	about them, unlike base Pod::Checker. For a valid link to an outside module
	119	or man page, call podcheck.t with the C<--add_link> option to add it to the
	120	the database of known links; for other causes, call podcheck.t with the C<--regen>
	121	option to regenerate the entire database. This tells it that all existing
	122	issues are to not be mentioned again.
	123
	124	C<--regen> isn't fool-proof. The database merely keeps track of the number of these
	125	potential problems of each type for each pod. If a new problem of a given
	126	type is introduced into the pod, podcheck.t will spit out all of them. You
	127	then have to figure out which is the new one, and should it be changed or not.
	128	But doing it this way insulates the database from having to keep track of line
	129	numbers of problems, which may change, or the exact wording of each problem
	130	which might also change without affecting whether it is a problem or not.
	131
	132	Also, if the count of potential problems of a given type for a pod decreases,
	133	the database must be regenerated so that it knows the new number. The program
	134	gives instructions when this happens.
	135
	136	Some pods will have varying numbers of problems of a given type. This can
	137	be handled by manually editing the database file (see L</FILES>), and setting
	138	the number of those problems for that pod to a negative number. This will
	139	cause the corresponding error to always be suppressed no matter how many there
	140	actually are.
	141
	142	Another problem is that there is currently no check that modules listed as
	143	valid in the data base
	144	actually are. Thus any errors introduced there will remain there.
	145
	146	=head2 Specially handled pods
	147
	148	=over
	149
	150	=item perltoc
	151
	152	This pod is generated by pasting bits from other pods. Errors in those bits
	153	will show up as errors here, as well as for those other pods. Therefore
	154	errors here are suppressed, and the pod is checked only to verify that nodes
	155	within it actually exist that are externally linked to.
	156
	157	=item perldelta
	158
	159	The current perldelta pod is initialized from a template that contains
	160	placeholder text. Some of this text is in the form of links that don't really
	161	exist. Any such links that are listed in C<@perldelta_ignore_links> will not
	162	generate messages. It is presumed that these links will be cleaned up when
	163	the perldelta is cleaned up for release since they should be marked with
	164	C<XXX>.
	165
	166	=item Porting/perldelta_template.pod
	167
	168	This is not a pod, but a template for C<perldelta>. Any errors introduced
	169	here will show up when C<perldelta> is created from it.
	170
	171	=item cpan-upstream pods
	172
	173	See the L</--cpan> option documentation
	174
	175	=item old perldeltas
	176
	177	See the L</--deltas> option documentation
	178
	179	=back
	180
	181	=head1 OPTIONS
	182
	183	=over
	184
	185	=item --add_link MODULE ...
	186
	187	Use this option to teach podcheck.t that the C<MODULE>s or man pages actually
	188	exist, and to silence any messages that links to them are broken.
	189
	190	podcheck.t checks that links within the Perl core distribution are valid, but
	191	it doesn't check links to man pages or external modules. When it finds
	192	a broken link, it checks its data base of external modules and man pages,
	193	and only if not found there does it raise a message. This option just adds
	194	the list of modules and man page references that follow it on the command line
	195	to that data base.
	196
	197	For example,
	198
	199	cd t
	200	./perl -I../lib porting/podcheck.t --add_link Unicode::Casing
	201
	202	causes the external module "Unicode::Casing" to be added to the data base, so
	203	C<LE<lt>Unicode::CasingE<gt>> will be considered valid.
	204
	205	=item --regen
	206
	207	Regenerate the data base used by podcheck.t to include all the existing
	208	potential problems. Future runs of the program will not then flag any of
	209	these.
	210
	211	=item --cpan
	212
	213	Normally, all pods in the cpan directory are skipped, except to make sure that
	214	any blead-upstream links to such pods are valid.
	215	This option will cause cpan upstream pods to be fully checked.
	216
	217	=item --deltas
	218
	219	Normally, all old perldelta pods are skipped, except to make sure that
	220	any links to such pods are valid. This is because they are considered
	221	stable, and perhaps trying to fix them will cause changes that will
	222	misrepresent Perl's history. But, this option will cause them to be fully
	223	checked.
	224
	225	=item --show_all
	226
	227	Normally, if the number of potential problems of a given type found for a
	228	pod matches the expected value in the database, they will not be displayed.
	229	This option forces the database to be ignored during the run, so all potential
	230	problems are displayed and will fail their respective pod test. Specifying
	231	any particular FILES to operate on automatically selects this option.
	232
	233	=item --counts
	234
	235	Instead of testing, this just dumps the counts of the occurrences of the
	236	various types of potential problems in the data base.
	237
	238	=back
	239
	240	=head1 FILES
	241
	242	The database is stored in F<t/porting/known_pod_issues.dat>
	243
	244	=head1 SEE ALSO
	245
	246	L<Pod::Checker>
	247
	248	=cut
	249
	250	# VMS builds have a '.com' appended to utility and script names, and it adds a
	251	# trailing dot for any other file name that doesn't have a dot in it. The db
	252	# is stored without those things. This regex allows for these special file
	253	# names to be dealt with. It needs to be interpolated into a larger regex
	254	# that furnishes the closing boundary.
	255	my $vms_re = qr/ \. (?: com )? /x;
	256
	257	# Some filenames in the MANIFEST match $vms_re, and so must not be handled the
	258	# same way that that the special vms ones are. This hash lists those.
	259	my %special_vms_files;
	260
	261	# This is to get this to work across multiple file systems, including those
	262	# that are not case sensitive. The db is stored in lower case, Un*x style,
	263	# and all file name comparisons are done that way.
	264	sub canonicalize($) {
	265	my $input = shift;
	266	my ($volume, $directories, $file)
	267	= File::Spec->splitpath(File::Spec->canonpath($input));
	268	# Assumes $volume is constant for everything in this directory structure
	269	$directories = "" if ! $directories;
	270	$file = "" if ! $file;
	271	$file = lc join '/', File::Spec->splitdir($directories), $file;
	272	$file =~ s! / /+ !/!gx; # Multiple slashes => single slash
	273
	274	# The db is stored without the special suffixes that are there in VMS, so
	275	# strip them off to get the comparable name. But some files on all
	276	# platforms have these suffixes, so this shouldn't happen for them, as any
	277	# of their db entries will have the suffixes in them. The hash has been
	278	# populated with these files.
	279	if ($^O eq 'VMS'
	280	&& $file =~ / ( $vms_re ) $ /x
	281	&& ! exists $special_vms_files{$file})
	282	{
	283	$file =~ s/ $1 $ //x;
	284	}
	285	return $file;
	286	}
	287
	288	#####################################################
	289	# HOW IT WORKS (in general)
	290	#
	291	# If not called with specific files to check, the directory structure is
	292	# examined for files that have pods in them. Files that might not have to be
	293	# fully parsed (e.g. in cpan) are parsed enough at this time to find their
	294	# pod's NAME, and to get a checksum.
	295	#
	296	# Those kinds of files are sorted last, but otherwise the pods are parsed with
	297	# the package coded here, My::Pod::Checker, which is an extension to
	298	# Pod::Checker that adds some tests and suppresses others that aren't
	299	# appropriate. The latter module has no provision for capturing diagnostics,
	300	# so a package, Tie_Array_to_FH, is used to force them to be placed into an
	301	# array instead of printed.
	302	#
	303	# Parsing the files builds up a list of links. The files are gone through
	304	# again, doing cross-link checking and outputting all saved-up problems with
	305	# each pod.
	306	#
	307	# Sorting the files last that potentially don't need to be fully parsed allows
	308	# us to not parse them unless there is a link to an internal anchor in them
	309	# from something that we have already parsed. Keeping checksums allows us to
	310	# not parse copies of other pods.
	311	#
	312	#####################################################
	313
	314	# 1 => Exclude low priority messages that aren't likely to be problems, and
	315	# has many false positives; higher numbers give more messages.
	316	my $Warnings_Level = 200;
	317
	318	# perldelta during construction may have place holder links. N.B. This
	319	# variable is referred to by name in release_managers_guide.pod
	320	our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" );
	321
	322	# To see if two pods with the same NAME are actually copies of the same pod,
	323	# which is not an error, it uses a checksum to save work.
	324	my $digest_type = "SHA-1";
	325
	326	my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
	327	my $data_dir = File::Spec->catdir($original_dir, 'porting');
	328	my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
	329	my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_dir), 'MANIFEST');
	330	my $copy_fh;
	331
	332	my $MAX_LINE_LENGTH = 100; # 79 columns
	333	my $INDENT = 7; # default nroff indent
	334
	335	# Our warning messages. Better not have [('"] in them, as those are used as
	336	# delimiters for variable parts of the messages by poderror.
	337	my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
	338	my $broken_link = "Apparent broken link";
	339	my $broken_internal_link = "Apparent internal link is missing its forward slash";
	340	my $multiple_targets = "There is more than one target";
	341	my $duplicate_name = "Pod NAME already used";
	342	my $need_encoding = "Should have =encoding statement because have non-ASCII";
	343	my $encoding_first = "=encoding must be first command (if present)";
	344	my $no_name = "There is no NAME";
	345	my $missing_name_description = "The NAME should have a dash and short description after it";
	346
	347	# objects, tests, etc can't be pods, so don't look for them. Also skip
	348	# files output by the patch program. Could also ignore most of .gitignore
	349	# files, but not all, so don't.
	350
	351	my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back
	352	my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d;
	353	my $lib_so = $Config{'so'}; $lib_so =~ tr/.//d;
	354	my $dl_ext = $Config{'dlext'}; $dl_ext =~ tr/.//d;
	355
	356	# Not really pods, but can look like them.
	357	my %excluded_files = (
	358	canonicalize("lib/unicore/mktables") => 1,
	359	canonicalize("Porting/make-rmg-checklist") => 1,
	360	canonicalize("Porting/perldelta_template.pod") => 1,
	361	canonicalize("regen/feature.pl") => 1,
	362	canonicalize("autodoc.pl") => 1,
	363	canonicalize("configpm") => 1,
	364	canonicalize("miniperl") => 1,
	365	canonicalize("perl") => 1,
	366	canonicalize('cpan/Pod-Perldoc/corpus/no-head.pod') => 1,
	367	canonicalize('cpan/Pod-Perldoc/corpus/perlfunc.pod') => 1,
	368	canonicalize('cpan/Pod-Perldoc/corpus/utf8.pod') => 1,
	369	canonicalize("lib/unicore/mktables") => 1,
	370	);
	371
	372	# This list should not include anything for which case sensitivity is
	373	# important, as it won't work on VMS, and won't show up until tested on VMS.
	374	# All or almost all such files should be listed in the MANIFEST, so that can
	375	# be examined for them, and each such file explicitly excluded, as is done for
	376	# .PL files in the loop just below this. For files not catchable this way,
	377	# is_pod_file() can be used to exclude these at a finer grained level.
	378	my $non_pods = qr/ (?: \.
	379	(?: [achot] \| zip \| gz \| bz2 \| jar \| tar \| tgz
	380	\| orig \| rej \| patch # Patch program output
	381	\| sw[op] \| \#.* # Editor droppings
	382	\| old # buildtoc output
	383	\| xs # pod should be in the .pm file
	384	\| al # autosplit files
	385	\| bs # bootstrap files
	386	\| (?i:sh) # shell scripts, hints, templates
	387	\| lst # assorted listing files
	388	\| bat # Windows,Netware,OS2 batch files
	389	\| cmd # Windows,Netware,OS2 command files
	390	\| lis # VMS compiler listings
	391	\| map # VMS linker maps
	392	\| opt # VMS linker options files
	393	\| mms # MM(K\|S) description files
	394	\| ts # timestamp files generated during build
	395	\| $obj_ext # object files
	396	\| exe # $Config{'exe_ext'} might be empty string
	397	\| $lib_ext # object libraries
	398	\| $lib_so # shared libraries
	399	\| $dl_ext # dynamic libraries
	400	\| gif # GIF images (example files from CGI.pm)
	401	\| eg # examples from libnet
	402	)
	403	$
	404	) \| ~$ \| \ $Autosaved$\.txt$ # Other editor droppings
	405	\| ^cxx\$demangler_db\.$ # VMS name mangler database
	406	\| ^typemap\.?$ # typemap files
	407	\| ^(?i:Makefile\.PL)$
	408	/x;
	409
	410	# '.PL' files should be excluded, as they aren't final pods, but often contain
	411	# material used in generating pods, and so can look like a pod. We can't use
	412	# the regexp above because case sensisitivity is important for these, as some
	413	# '.pl' files should be examined for pods. Instead look through the MANIFEST
	414	# for .PL files and get their full path names, so we can exclude each such
	415	# file explicitly. This works because other porting tests prohibit having two
	416	# files with the same names except for case.
	417	open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST";
	418	while (<$manifest_fh>) {
	419
	420	# While we have MANIFEST open, on VMS platforms, look for files that match
	421	# the magic VMS file names that have to be handled specially. Add these
	422	# to the list of them.
	423	if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) {
	424	$special_vms_files{$1} = 1;
	425	}
	426	if (/ ^ ( [^\t]* \. PL ) \t /x) {
	427	$excluded_files{canonicalize($1)} = 1;
	428	}
	429	}
	430	close $manifest_fh, or die "Can't close $MANIFEST";
	431
	432
	433	# Pod::Checker messages to suppress
	434	my @suppressed_messages = (
	435	"(section) in", # Checker is wrong to flag this
	436	"multiple occurrence of link target", # We catch independently the ones
	437	# that are real problems.
	438	"unescaped <>",
	439	"Entity number out of range", # Checker outputs this for anything above
	440	# 255, but in fact all Unicode is valid
	441	"No items in =over", # ie a blockquote
	442	);
	443
	444	sub suppressed {
	445	# Returns bool as to if input message is one that is to be suppressed
	446
	447	my $message = shift;
	448	return grep { $message =~ /^\Q$_/i } @suppressed_messages;
	449	}
	450
	451	{ # Closure to contain a simple subset of test.pl. This is to get rid of the
	452	# unnecessary 'failed at' messages that would otherwise be output pointing
	453	# to a particular line in this file.
	454
	455	my $current_test = 0;
	456	my $planned;
	457
	458	sub plan {
	459	my %plan = @_;
	460	$planned = $plan{tests} + 1; # +1 for final test that files haven't
	461	# been removed
	462	print "1..$planned\n";
	463	return;
	464	}
	465
	466	sub ok {
	467	my $success = shift;
	468	my $message = shift;
	469
	470	chomp $message;
	471
	472	$current_test++;
	473	print "not " unless $success;
	474	print "ok $current_test - $message\n";
	475	return $success;
	476	}
	477
	478	sub skip {
	479	my $why = shift;
	480	my $n = @_ ? shift : 1;
	481	for (1..$n) {
	482	$current_test++;
	483	print "ok $current_test # skip $why\n";
	484	}
	485	no warnings 'exiting';
	486	last SKIP;
	487	}
	488
	489	sub note {
	490	my $message = shift;
	491
	492	chomp $message;
	493
	494	print $message =~ s/^/# /mgr;
	495	print "\n";
	496	return;
	497	}
	498
	499	END {
	500	if ($planned && $planned != $current_test) {
	501	print STDERR
	502	"# Looks like you planned $planned tests but ran $current_test.\n";
	503	}
	504	}
	505	}
	506
	507	# List of known potential problems by pod and type.
	508	my %known_problems;
	509
	510	# Pods given by the keys contain an interior node that is referred to from
	511	# outside it.
	512	my %has_referred_to_node;
	513
	514	my $show_counts = 0;
	515	my $regen = 0;
	516	my $add_link = 0;
	517	my $show_all = 0;
	518
	519	my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan
	520	my $do_deltas = 0; # And stable perldeltas
	521
	522	while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
	523	my $arg = shift @ARGV;
	524
	525	$arg =~ s/^--/-/; # Treat '--' the same as a single '-'
	526	if ($arg eq '-regen') {
	527	$regen = 1;
	528	}
	529	elsif ($arg eq '-add_link') {
	530	$add_link = 1;
	531	}
	532	elsif ($arg eq '-cpan') {
	533	$do_upstream_cpan = 1;
	534	}
	535	elsif ($arg eq '-deltas') {
	536	$do_deltas = 1;
	537	}
	538	elsif ($arg eq '-show_all') {
	539	$show_all = 1;
	540	}
	541	elsif ($arg eq '-counts') {
	542	$show_counts = 1;
	543	}
	544	else {
	545	die <<EOF;
	546	Unknown option '$arg'
	547
	548	Usage: $0 [ --regen \| --cpan \| --show_all \| FILE ... \| --add_link MODULE ... ]\n"
	549	--add_link -> Add the MODULE and man page references to the data base
	550	--regen -> Regenerate the data file for $0
	551	--cpan -> Include files in the cpan subdirectory.
	552	--deltas -> Include stable perldeltas
	553	--show_all -> Show all known potential problems
	554	--counts -> Don't test, but give summary counts of the currently
	555	existing database
	556	EOF
	557	}
	558	}
	559
	560	my @files = @ARGV;
	561
	562	my $cpan_or_deltas = $do_upstream_cpan \|\| $do_deltas;
	563	if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) {
	564	croak "--regen, --show_all, --counts, and --add_link are mutually exclusive\n and none can be run with --cpan nor --deltas";
	565	}
	566
	567	my $has_input_files = @files;
	568
	569
	570	if ($add_link) {
	571	if (! $has_input_files) {
	572	croak "--add_link requires at least one module or man page reference";
	573	}
	574	}
	575	elsif ($has_input_files) {
	576	if ($regen \|\| $show_counts \|\| $do_upstream_cpan \|\| $do_deltas) {
	577	croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files";
	578	}
	579	foreach my $file (@files) {
	580	croak "Can't read file '$file'" if ! -r $file;
	581	}
	582	}
	583
	584	our %problems; # potential problems found in this run
	585
	586	package My::Pod::Checker { # Extend Pod::Checker
	587	use parent 'Pod::Checker';
	588
	589	# Uses inside out hash to protect from typos
	590	# For new fields, remember to add to destructor DESTROY()
	591	my %indents; # Stack of indents from =over's in effect for
	592	# current line
	593	my %current_indent; # Current line's indent
	594	my %filename; # The pod is store in this file
	595	my %skip; # is SKIP set for this pod
	596	my %in_NAME; # true if within NAME section
	597	my %in_begin; # true if within =begin section
	598	my %linkable_item; # Bool: if the latest =item is linkable. It isn't
	599	# for bullet and number lists
	600	my %linkable_nodes; # Pod::Checker adds all =items to its node list,
	601	# but not all =items are linkable to
	602	my %seen_encoding_cmd; # true if have =encoding earlier
	603	my %command_count; # Number of commands seen
	604	my %seen_pod_cmd; # true if have =pod earlier
	605	my %warned_encoding; # true if already have warned about =encoding
	606	# problems
	607
	608	sub DESTROY {
	609	my $addr = Scalar::Util::refaddr $_[0];
	610	delete $command_count{$addr};
	611	delete $current_indent{$addr};
	612	delete $filename{$addr};
	613	delete $in_begin{$addr};
	614	delete $indents{$addr};
	615	delete $in_NAME{$addr};
	616	delete $linkable_item{$addr};
	617	delete $linkable_nodes{$addr};
	618	delete $seen_encoding_cmd{$addr};
	619	delete $seen_pod_cmd{$addr};
	620	delete $skip{$addr};
	621	delete $warned_encoding{$addr};
	622	return;
	623	}
	624
	625	sub new {
	626	my $class = shift;
	627	my $filename = shift;
	628
	629	my $self = $class->SUPER::new(-quiet => 1,
	630	-warnings => $Warnings_Level);
	631	my $addr = Scalar::Util::refaddr $self;
	632	$command_count{$addr} = 0;
	633	$current_indent{$addr} = 0;
	634	$filename{$addr} = $filename;
	635	$in_begin{$addr} = 0;
	636	$in_NAME{$addr} = 0;
	637	$linkable_item{$addr} = 0;
	638	$seen_encoding_cmd{$addr} = 0;
	639	$seen_pod_cmd{$addr} = 0;
	640	$warned_encoding{$addr} = 0;
	641	return $self;
	642	}
	643
	644	# re's for messages that Pod::Checker outputs
	645	my $location = qr/ \b (?:in\|at\|on\|near) \s+ /xi;
	646	my $optional_location = qr/ (?: $location )? /xi;
	647	my $line_reference = qr/ [('"]? $optional_location \b line \s+
	648	(?: \d+ \| EOF \| \Q???\E \| - )
	649	[)'"]? /xi;
	650
	651	sub poderror { # Called to register a potential problem
	652
	653	# This adds an extra field to the parent hash, 'parameter'. It is
	654	# used to extract the variable parts of a message leaving just the
	655	# constant skeleton. This in turn allows the message to be
	656	# categorized better, so that it shows up as a single type in our
	657	# database, with the specifics of each occurrence not being stored with
	658	# it.
	659
	660	my $self = shift;
	661	my $opts = shift;
	662
	663	my $addr = Scalar::Util::refaddr $self;
	664	return if $skip{$addr};
	665
	666	# Input can be a string or hash. If a string, parse it to separate
	667	# out the line number and convert to a hash for easier further
	668	# processing
	669	my $message;
	670	if (ref $opts ne 'HASH') {
	671	$message = join "", $opts, @_;
	672	my $line_number;
	673	if ($message =~ s/\s*($line_reference)//) {
	674	($line_number = $1) =~ s/\s*$optional_location//;
	675	}
	676	else {
	677	$line_number = '???';
	678	}
	679	$opts = { -msg => $message, -line => $line_number };
	680	} else {
	681	$message = $opts->{'-msg'};
	682
	683	}
	684
	685	$message =~ s/^\d+\s+//;
	686	return if main::suppressed($message);
	687
	688	$self->SUPER::poderror($opts, @_);
	689
	690	$opts->{parameter} = "" unless $opts->{parameter};
	691
	692	# The variable parts of the message tend to be enclosed in '...',
	693	# "....", or (...). Extract them and put them in an extra field,
	694	# 'parameter'. This is trickier because the matching delimiter to a
	695	# '(' is its mirror, and not itself. Text::Balanced could be used
	696	# instead.
	697	while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
	698	my $delimiter = $1;
	699	my $start = $-[0];
	700	$delimiter = ')' if $delimiter eq '(';
	701
	702	# If there is no ending delimiter, don't consider it to be a
	703	# variable part. Most likely it is a contraction like "Don't"
	704	last unless $message =~ m/\G .+? \Q$delimiter/xg;
	705
	706	my $length = $+[0] - $start;
	707
	708	# Get the part up through the closing delimiter
	709	my $special = substr($message, $start, $length);
	710	$special =~ s/^\s+//; # No leading whitespace
	711
	712	# And add that variable part to the parameter, while removing it
	713	# from the message. This isn't a foolproof way of finding the
	714	# variable part. For example '(s)' can occur in e.g.,
	715	# 'paragraph(s)'
	716	if ($special ne '(s)') {
	717	substr($message, $start, $length) = "";
	718	pos $message = $start;
	719	$opts->{-msg} = $message;
	720	$opts->{parameter} .= " " if $opts->{parameter};
	721	$opts->{parameter} .= $special;
	722	}
	723	}
	724
	725	# Extract any additional line number given. This is often the
	726	# beginning location of something whereas the main line number gives
	727	# the ending one.
	728	if ($message =~ /( $line_reference )/xi) {
	729	my $line_ref = $1;
	730	while ($message =~ s/\s*\Q$line_ref//) {
	731	$opts->{-msg} = $message;
	732	$opts->{parameter} .= " " if $opts->{parameter};
	733	$opts->{parameter} .= $line_ref;
	734	}
	735	}
	736
	737	Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
	738	push @{$problems{$filename{$addr}}{$message}}, $opts;
	739	#push @{$problems{$self->get_filename}{$message}}, $opts;
	740	}
	741
	742	sub check_encoding { # Does it need an =encoding statement?
	743	my ($self, $paragraph, $line_num, $pod_para) = @_;
	744
	745	# Do nothing if there is an =encoding in the file, or if the line
	746	# doesn't require an =encoding, or have already warned.
	747	my $addr = Scalar::Util::refaddr $self;
	748	return if $seen_encoding_cmd{$addr}
	749	\|\| $warned_encoding{$addr}
	750	\|\| $paragraph !~ /\P{ASCII}/;
	751
	752	$warned_encoding{$addr} = 1;
	753	my ($file, $line) = $pod_para->file_line;
	754	$self->poderror({ -line => $line, -file => $file,
	755	-msg => $need_encoding
	756	});
	757	return;
	758	}
	759
	760	sub verbatim {
	761	my ($self, $paragraph, $line_num, $pod_para) = @_;
	762	$self->check_encoding($paragraph, $line_num, $pod_para);
	763
	764	$self->SUPER::verbatim($paragraph, $line_num, $pod_para);
	765
	766	my $addr = Scalar::Util::refaddr $self;
	767
	768	# Pick up the name, since the parent class doesn't in verbatim
	769	# NAMEs; so treat as non-verbatim. The parent class only allows one
	770	# paragraph in a NAME section, so if there is an extra blank line, it
	771	# will trigger a message, but such a blank line is harmless, so skip
	772	# in that case.
	773	if ($in_NAME{$addr} && $paragraph =~ /\S/) {
	774	$self->textblock($paragraph, $line_num, $pod_para);
	775	}
	776
	777	my @lines = split /^/, $paragraph;
	778	for my $i (0 .. @lines - 1) {
	779	if ( my $encoding = $seen_encoding_cmd{$addr} ) {
	780	require Encode;
	781	$lines[$i] = Encode::decode($encoding, $lines[$i]);
	782	}
	783	$lines[$i] =~ s/\s+$//;
	784	my $indent = $self->get_current_indent;
	785	my $exceeds = length(Text::Tabs::expand($lines[$i]))
	786	+ $indent - $MAX_LINE_LENGTH;
	787	next unless $exceeds > 0;
	788	my ($file, $line) = $pod_para->file_line;
	789	$self->poderror({ -line => $line + $i, -file => $file,
	790	-msg => $line_length,
	791	parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
	792	});
	793	}
	794	}
	795
	796	sub textblock {
	797	my ($self, $paragraph, $line_num, $pod_para) = @_;
	798	$self->check_encoding($paragraph, $line_num, $pod_para);
	799
	800	$self->SUPER::textblock($paragraph, $line_num, $pod_para);
	801
	802	my ($file, $line) = $pod_para->file_line;
	803	my $addr = Scalar::Util::refaddr $self;
	804	if ($in_NAME{$addr}) {
	805	if (! $self->name) {
	806	my $text = $self->interpolate($paragraph, $line_num);
	807	if ($text =~ /^\s(\S+?)\s$/) {
	808	$self->name($1);
	809	$self->poderror({ -line => $line, -file => $file,
	810	-msg => $missing_name_description,
	811	parameter => $1});
	812	}
	813	}
	814	}
	815
	816	return;
	817	}
	818
	819	sub command {
	820	my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
	821	my $addr = Scalar::Util::refaddr $self;
	822	if ($cmd eq "pod") {
	823	$seen_pod_cmd{$addr}++;
	824	}
	825	elsif ($cmd eq "encoding") {
	826	my ($file, $line) = $pod_para->file_line;
	827	$seen_encoding_cmd{$addr} = $paragraph; # for later decoding
	828	if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
	829	$self->poderror({ -line => $line, -file => $file,
	830	-msg => $encoding_first
	831	});
	832	}
	833	}
	834	$self->check_encoding($paragraph, $line_num, $pod_para);
	835
	836	# Pod::Check treats all =items as linkable, but the bullet and
	837	# numbered lists really aren't. So keep our own list. This has to be
	838	# processed before SUPER is called so that the list is started before
	839	# the rest of it gets parsed.
	840	if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
	841	$linkable_item{$addr} = ($paragraph !~ / ^ \s*
	842	(?: [*]
	843	\| \d+ \.? (?: \$ \| \s+ )
	844	)/x)
	845	? 1
	846	: 0;
	847
	848	}
	849	$self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
	850
	851	$command_count{$addr}++;
	852
	853	$in_NAME{$addr} = 0; # Will change to 1 below if necessary
	854	$in_begin{$addr} = 0; # ibid
	855	if ($cmd eq 'over') {
	856	my $text = $self->interpolate($paragraph, $line_num);
	857	my $indent = 4; # default
	858	$indent = $1 if $text && $text =~ /^\s(\d+)\s$/;
	859	push @{$indents{$addr}}, $indent;
	860	$current_indent{$addr} += $indent;
	861	}
	862	elsif ($cmd eq 'back') {
	863	if (@{$indents{$addr}}) {
	864	$current_indent{$addr} -= pop @{$indents{$addr}};
	865	}
	866	else {
	867	# =back without corresponding =over, but should have
	868	# warned already
	869	$current_indent{$addr} = 0;
	870	}
	871	}
	872	elsif ($cmd =~ /^head/) {
	873	if (! $in_begin{$addr}) {
	874
	875	# If a particular formatter, then this command doesn't really
	876	# apply
	877	$current_indent{$addr} = 0;
	878	undef @{$indents{$addr}};
	879	}
	880
	881	my $text = $self->interpolate($paragraph, $line_num);
	882	$in_NAME{$addr} = 1 if $cmd eq 'head1'
	883	&& $text && $text =~ /^NAME\b/;
	884	}
	885	elsif ($cmd eq 'begin') {
	886	$in_begin{$addr} = 1;
	887	}
	888
	889	return;
	890	}
	891
	892	sub hyperlink {
	893	my $self = shift;
	894
	895	my $page;
	896	if ($_[0] && ($page = $_[0][1]{'-page'})) {
	897	my $node = $_[0][1]{'-node'};
	898
	899	# If the hyperlink is to an interior node of another page, save it
	900	# so that we can see if we need to parse normally skipped files.
	901	$has_referred_to_node{$page} = 1 if $node;
	902
	903	# Ignore certain placeholder links in perldelta. Check if the
	904	# link is page-level, and also check if to a node within the page
	905	if ($self->name && $self->name eq "perldelta"
	906	&& ((grep { $page eq $_ } @perldelta_ignore_links)
	907	\|\| ($node
	908	&& (grep { "$page/$node" eq $_ } @perldelta_ignore_links)
	909	))) {
	910	return;
	911	}
	912	}
	913	return $self->SUPER::hyperlink($_[0]);
	914	}
	915
	916	sub node {
	917	my $self = shift;
	918	my $text = $_[0];
	919	if($text) {
	920	$text =~ s/\s+$//s; # strip trailing whitespace
	921	$text =~ s/\s+/ /gs; # collapse whitespace
	922	my $addr = Scalar::Util::refaddr $self;
	923	push(@{$linkable_nodes{$addr}}, $text) if
	924	! $current_indent{$addr}
	925	\|\| $linkable_item{$addr};
	926	}
	927	return $self->SUPER::node($_[0]);
	928	}
	929
	930	sub get_current_indent {
	931	return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
	932	}
	933
	934	sub get_filename {
	935	return $filename{Scalar::Util::refaddr $_[0]};
	936	}
	937
	938	sub linkable_nodes {
	939	my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
	940	return undef unless $linkables;
	941	return @$linkables;
	942	}
	943
	944	sub get_skip {
	945	return $skip{Scalar::Util::refaddr $_[0]} // 0;
	946	}
	947
	948	sub set_skip {
	949	my $self = shift;
	950	$skip{Scalar::Util::refaddr $self} = shift;
	951
	952	# If skipping, no need to keep the problems for it
	953	delete $problems{$self->get_filename};
	954	return;
	955	}
	956
	957	sub parse_from_file {
	958	# This overrides the super class method so that if an open fails on a
	959	# transitory file, it doesn't croak. It returns 1 if it did find the
	960	# file, 0 if it didn't
	961
	962	my $self = shift;
	963	my $filename = shift;
	964	# ignores 2nd param, which is output file. Always uses undef
	965
	966	if (open my $in_fh, '<:bytes', $filename) {
	967	$self->SUPER::parse_from_filehandle($in_fh, undef);
	968	close $in_fh;
	969	return 1;
	970	}
	971
	972	# If couldn't open file, perhaps it was transitory, and hence not an error
	973	return 0 unless -e $filename;
	974
	975	die "Can't open '$filename': $!\n";
	976	}
	977	}
	978
	979	package Tie_Array_to_FH { # So printing actually goes to an array
	980
	981	my %array;
	982
	983	sub TIEHANDLE {
	984	my $class = shift;
	985	my $array_ref = shift;
	986
	987	my $self = bless \do{ my $anonymous_scalar }, $class;
	988	$array{Scalar::Util::refaddr $self} = $array_ref;
	989
	990	return $self;
	991	}
	992
	993	sub PRINT {
	994	my $self = shift;
	995	push @{$array{Scalar::Util::refaddr $self}}, @_;
	996	return 1;
	997	}
	998	}
	999
	1000
	1001	my %filename_to_checker; # Map a filename to it's pod checker object
	1002	my %id_to_checker; # Map a checksum to it's pod checker object
	1003	my %nodes; # key is filename, values are nodes in that file.
	1004	my %nodes_first_word; # same, but value is first word of each node
	1005	my %valid_modules; # List of modules known to exist outside us.
	1006	my %digests; # checksums of files, whose names are the keys
	1007	my %filename_to_pod; # Map a filename to its pod NAME
	1008	my %files_with_unknown_issues;
	1009	my %files_with_fixes;
	1010
	1011	my $data_fh;
	1012	open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
	1013
	1014	my %counts; # For --counts param, count of each issue type
	1015	my %suppressed_files; # Files with at least one issue type to suppress
	1016	my $HEADER = <<END;
	1017	# This file is the data file for $0.
	1018	# There are three types of lines.
	1019	# Comment lines are white-space only or begin with a '#', like this one. Any
	1020	# changes you make to the comment lines will be lost when the file is
	1021	# regen'd.
	1022	# Lines without tab characters are simply NAMES of pods that the program knows
	1023	# will have links to them and the program does not check if those links are
	1024	# valid.
	1025	# All other lines should have three fields, each separated by a tab. The
	1026	# first field is the name of a pod; the second field is an error message
	1027	# generated by this program; and the third field is a count of how many
	1028	# known instances of that message there are in the pod. -1 means that the
	1029	# program can expect any number of this type of message.
	1030	END
	1031
	1032	my @existing_issues;
	1033
	1034
	1035	while (<$data_fh>) { # Read the data base
	1036	chomp;
	1037	next if /^\s*(?:#\|$)/; # Skip comment and empty lines
	1038	if (/\t/) {
	1039	next if $show_all;
	1040	if ($add_link) { # The issues are saved and later output unchanged
	1041	push @existing_issues, $_;
	1042	next;
	1043	}
	1044
	1045	# Keep track of counts of each issue type for each file
	1046	my ($filename, $message, $count) = split /\t/;
	1047	$known_problems{$filename}{$message} = $count;
	1048
	1049	if ($show_counts) {
	1050	if ($count < 0) { # -1 means to suppress this issue type
	1051	$suppressed_files{$filename} = $filename;
	1052	}
	1053	else {
	1054	$counts{$message} += $count;
	1055	}
	1056	}
	1057	}
	1058	else { # Lines without a tab are modules known to be valid
	1059	$valid_modules{$_} = 1
	1060	}
	1061	}
	1062	close $data_fh;
	1063
	1064	if ($add_link) {
	1065	$copy_fh = open_new($known_issues);
	1066
	1067	# Check for basic sanity, and add each command line argument
	1068	foreach my $module (@files) {
	1069	die "\"$module\" does not look like a module or man page"
	1070	# Must look like (A or A::B or A::B::C ..., or foo(3C)
	1071	if $module !~ /^ (?: \w+ (?: :: \w+ )* \| \w+ $ \d \w* $ ) $/x;
	1072	$valid_modules{$module} = 1
	1073	}
	1074	my_safer_print($copy_fh, $HEADER);
	1075	foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
	1076	my_safer_print($copy_fh, $_, "\n");
	1077	}
	1078
	1079	# The rest of the db file is output unchanged.
	1080	my_safer_print($copy_fh, join "\n", @existing_issues, "");
	1081
	1082	close_and_rename($copy_fh);
	1083	exit;
	1084	}
	1085
	1086	if ($show_counts) {
	1087	my $total = 0;
	1088	foreach my $message (sort keys %counts) {
	1089	$total += $counts{$message};
	1090	note(Text::Tabs::expand("$counts{$message}\t$message"));
	1091	}
	1092	note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
	1093	if (%suppressed_files) {
	1094	note("\nFiles that have all messages of at least one type suppressed:");
	1095	note(join ",", keys %suppressed_files);
	1096	}
	1097	exit 0;
	1098	}
	1099
	1100	# re to match files that are to be parsed only if there is an internal link
	1101	# to them. It does not include cpan, as whether those are parsed depends
	1102	# on a switch. Currently, only perltoc and the stable perldelta.pod's
	1103	# are included. The latter all have characters between 'perl' and
	1104	# 'delta'. (Actually the currently developed one matches as well, but
	1105	# is a duplicate of perldelta.pod, so can be skipped, so fine for it to
	1106	# match this.
	1107	my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $
	1108	/x;
	1109	unless ($do_deltas) {
	1110	$only_for_interior_links_re = qr/$only_for_interior_links_re \|
	1111	\b perl \d+ delta \. pod \b
	1112	/x;
	1113	}
	1114
	1115	{ # Closure
	1116	my $first_time = 1;
	1117
	1118	sub output_thanks ($$$$) { # Called when an issue has been fixed
	1119	my $filename = shift;
	1120	my $original_count = shift;
	1121	my $current_count = shift;
	1122	my $message = shift;
	1123
	1124	$files_with_fixes{$filename} = 1;
	1125	my $return;
	1126	my $fixed_count = $original_count - $current_count;
	1127	my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
	1128	my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
	1129	my $diff;
	1130	if ($message) {
	1131	$diff = <<EOF;
	1132	There were $original_count occurrences (now $current_count) in this pod of type
	1133	"$message",
	1134	EOF
	1135	} else {
	1136	$diff = <<EOF;
	1137	There are no longer any problems found in this pod!
	1138	EOF
	1139	}
	1140
	1141	if ($first_time) {
	1142	$first_time = 0;
	1143	$return = <<EOF;
	1144	Thanks for fixing $a_problem!
	1145	$diff
	1146	Now you must teach $0 that this was fixed.
	1147	EOF
	1148	}
	1149	else {
	1150	$return = <<EOF
	1151	Thanks for fixing $another_problem.
	1152	$diff
	1153	EOF
	1154	}
	1155
	1156	return $return;
	1157	}
	1158	}
	1159
	1160	sub my_safer_print { # print, with error checking for outputting to db
	1161	my ($fh, @lines) = @_;
	1162
	1163	if (! print $fh @lines) {
	1164	my $save_error = $!;
	1165	close($fh);
	1166	die "Write failure: $save_error";
	1167	}
	1168	}
	1169
	1170	sub extract_pod { # Extracts just the pod from a file; returns undef if file
	1171	# doesn't exist
	1172	my $filename = shift;
	1173
	1174	my @pod;
	1175
	1176	# Arrange for the output of Pod::Parser to be collected in an array we can
	1177	# look at instead of being printed
	1178	tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
	1179	if (open my $in_fh, '<:bytes', $filename) {
	1180	my $parser = Pod::Parser->new();
	1181	$parser->parse_from_filehandle($in_fh, *ALREADY_FH);
	1182	close $in_fh;
	1183
	1184	return join "", @pod
	1185	}
	1186
	1187	# The file should already have been opened once to get here, so if that
	1188	# fails, something is wrong. It's possible that a transitory file
	1189	# containing a pod would get here, so if the file no longer exists just
	1190	# return undef.
	1191	return unless -e $filename;
	1192	die "Can't open '$filename': $!\n";
	1193	}
	1194
	1195	my $digest = Digest->new($digest_type);
	1196
	1197	# This is used as a callback from File::Find::find(), which always constructs
	1198	# pathnames using Unix separators
	1199	sub is_pod_file {
	1200	# If $_ is a pod file, add it to the lists and do other prep work.
	1201
	1202	if (-d) {
	1203	# Don't look at files in directories that are for tests, nor those
	1204	# beginning with a dot
	1205	if (m!/t\z! \|\| m!/\.!) {
	1206	$File::Find::prune = 1;
	1207	}
	1208	return;
	1209	}
	1210
	1211	return unless -r && -s; # Can't check it if can't read it; no need to
	1212	# check if 0 length
	1213	return unless -f \|\| -l; # Weird file types won't be pods
	1214
	1215	my ($leaf) = m!([^/]+)\z!;
	1216	if (m!/\.! # No hidden Unix files
	1217	\|\| $leaf =~ $non_pods) {
	1218	note("Not considering $_") if DEBUG;
	1219	return;
	1220	}
	1221
	1222	my $filename = $File::Find::name;
	1223
	1224	# $filename is relative, like './path'. Strip that initial part away.
	1225	$filename =~ s!^\./!! or die 'Unexpected pathname "$filename"';
	1226
	1227	return if $excluded_files{canonicalize($filename)};
	1228
	1229	my $contents = do {
	1230	local $/;
	1231	my $candidate;
	1232	if (! open $candidate, '<:bytes', $_) {
	1233
	1234	# If a transitory file was found earlier, the open could fail
	1235	# legitimately and we just skip the file; also skip it if it is a
	1236	# broken symbolic link, as it is probably just a build problem;
	1237	# certainly not a file that we would want to check the pod of.
	1238	# Otherwise fail it here and no reason to process it further.
	1239	# (But the test count will be off too)
	1240	ok(0, "Can't open '$filename': $!")
	1241	if -r $filename && ! -l $filename;
	1242	return;
	1243	}
	1244	<$candidate>;
	1245	};
	1246
	1247	# If the file is a .pm or .pod, having any initial '=' on a line is
	1248	# grounds for testing it. Otherwise, require a head1 NAME line to
	1249	# consider it as a potential pod
	1250	if ($filename =~ /\.(?:pm\|pod)/) {
	1251	return unless $contents =~ /^=/m;
	1252	} else {
	1253	return unless $contents =~ /^=head1 +NAME/m;
	1254	}
	1255
	1256	# Here, we know that the file is a pod. Add it to the list of files
	1257	# to check and create a checker object for it.
	1258
	1259	push @files, $filename;
	1260	my $checker = My::Pod::Checker->new($filename);
	1261	$filename_to_checker{$filename} = $checker;
	1262
	1263	# In order to detect duplicate pods and only analyze them once, we
	1264	# compute checksums for the file, so don't have to do an exact
	1265	# compare. Note that if the pod is just part of the file, the
	1266	# checksums can differ for the same pod. That special case is handled
	1267	# later, since if the checksums of the whole file are the same, that
	1268	# case won't even come up. We don't need the checksums for files that
	1269	# we parse only if there is a link to its interior, but we do need its
	1270	# NAME, which is also retrieved in the code below.
	1271
	1272	if ($filename =~ / (?: ^(cpan\|lib\|ext\|dist)\/ )
	1273	\| $only_for_interior_links_re
	1274	/x) {
	1275	$digest->add($contents);
	1276	$digests{$filename} = $digest->digest;
	1277
	1278	# lib files aren't analyzed if they are duplicates of files copied
	1279	# there from some other directory. But to determine this, we need
	1280	# to know their NAMEs. We might as well find the NAME now while
	1281	# the file is open. Similarly, cpan files aren't analyzed unless
	1282	# we're analyzing all of them, or this particular file is linked
	1283	# to by a file we are analyzing, and thus we will want to verify
	1284	# that the target exists in it. We need to know at least the NAME
	1285	# to see if it's worth analyzing, or so we can determine if a lib
	1286	# file is a copy of a cpan one.
	1287	if ($filename =~ m{ (?: ^ (?: cpan \| lib ) / )
	1288	\| $only_for_interior_links_re
	1289	}x) {
	1290	if ($contents =~ /^=head1 +NAME.*/mg) {
	1291	# The NAME is the first non-spaces on the line up to a
	1292	# comma, dash or end of line. Otherwise, it's invalid and
	1293	# this pod doesn't have a legal name that we're smart
	1294	# enough to find currently. But the parser will later
	1295	# find it if it thinks there is a legal name, and set the
	1296	# name
	1297	if ($contents =~ /\G # continue from the line after =head1
	1298	\s* # ignore any empty lines
	1299	^ \s* ( \S+?) \s* (?: [,-] \| $ )/mx) {
	1300	my $name = $1;
	1301	$checker->name($name);
	1302	$id_to_checker{$name} = $checker
	1303	if $filename =~ m{^cpan/};
	1304	}
	1305	}
	1306	elsif ($filename =~ m{^cpan/}) {
	1307	$id_to_checker{$digests{$filename}} = $checker;
	1308	}
	1309	}
	1310	}
	1311
	1312	return;
	1313	} # End of is_pod_file()
	1314
	1315	# Start of real code that isn't processing the command line (except the
	1316	# db is read in above, as is processing of the --add_link option).
	1317	# Here, @files contains list of files on the command line. If have any of
	1318	# these, unconditionally test them, and show all the errors, even the known
	1319	# ones, and, since not testing other pods, don't do cross-pod link tests.
	1320	# (Could add extra code to do cross-pod tests for the ones in the list.)
	1321
	1322	if ($has_input_files) {
	1323	undef %known_problems;
	1324	$do_upstream_cpan = $do_deltas = 1; # In case one of the inputs is one
	1325	# of these types
	1326	}
	1327	else { # No input files -- go find all the possibilities.
	1328	if ($regen) {
	1329	$copy_fh = open_new($known_issues);
	1330	note("Regenerating $known_issues, please be patient...");
	1331	print $copy_fh $HEADER;
	1332	}
	1333
	1334	# Move to the directory above us, but have to adjust @INC to account for
	1335	# that.
	1336	s{^\.\./lib$}{lib} for @INC;
	1337	chdir File::Spec->updir;
	1338
	1339	# And look in this directory and all its subdirectories
	1340	find( {wanted => \&is_pod_file, no_chdir => 1}, '.');
	1341
	1342	# Add ourselves to the test
	1343	push @files, "t/porting/podcheck.t";
	1344	}
	1345
	1346	# Now we know how many tests there will be.
	1347	plan (tests => scalar @files) if ! $regen;
	1348
	1349
	1350	# Sort file names so we get consistent results, and to put cpan last,
	1351	# preceeded by the ones that we don't generally parse. This is because both
	1352	# these classes are generally parsed only if there is a link to the interior
	1353	# of them, and we have to parse all others first to guarantee that they don't
	1354	# have such a link. 'lib' files come just before these, as some of these are
	1355	# duplicates of others. We already have figured this out when gathering the
	1356	# data as a special case for all such files, but this, while unnecessary,
	1357	# puts the derived file last in the output. 'readme' files come before those,
	1358	# as those also could be duplicates of others, which are considered the
	1359	# primary ones. These currently aren't figured out when gathering data, so
	1360	# are done here.
	1361	@files = sort { if ($a =~ /^cpan/) {
	1362	return 1 if $b !~ /^cpan/;
	1363	return lc $a cmp lc $b;
	1364	}
	1365	elsif ($b =~ /^cpan/) {
	1366	return -1;
	1367	}
	1368	elsif ($a =~ /$only_for_interior_links_re/) {
	1369	return 1 if $b !~ /$only_for_interior_links_re/;
	1370	return lc $a cmp lc $b;
	1371	}
	1372	elsif ($b =~ /$only_for_interior_links_re/) {
	1373	return -1;
	1374	}
	1375	elsif ($a =~ /^lib/) {
	1376	return 1 if $b !~ /^lib/;
	1377	return lc $a cmp lc $b;
	1378	}
	1379	elsif ($b =~ /^lib/) {
	1380	return -1;
	1381	} elsif ($a =~ /\breadme\b/i) {
	1382	return 1 if $b !~ /\breadme\b/i;
	1383	return lc $a cmp lc $b;
	1384	}
	1385	elsif ($b =~ /\breadme\b/i) {
	1386	return -1;
	1387	}
	1388	else {
	1389	return lc $a cmp lc $b;
	1390	}
	1391	}
	1392	@files;
	1393
	1394	# Now go through all the files and parse them
	1395	FILE:
	1396	foreach my $filename (@files) {
	1397	my $parsed = 0;
	1398	note("parsing $filename") if DEBUG;
	1399
	1400	# We may have already figured out some things in the process of generating
	1401	# the file list. If so, we have a $checker object already. But if not,
	1402	# generate one now.
	1403	my $checker = $filename_to_checker{$filename};
	1404	if (! $checker) {
	1405	$checker = My::Pod::Checker->new($filename);
	1406	$filename_to_checker{$filename} = $checker;
	1407	}
	1408
	1409	# We have set the name in the checker object if there is a possibility
	1410	# that no further parsing is necessary, but otherwise do the parsing now.
	1411	if (! $checker->name) {
	1412	if (! $checker->parse_from_file($filename, undef)) {
	1413	$checker->set_skip("$filename is transitory");
	1414	next FILE;
	1415	}
	1416	$parsed = 1;
	1417
	1418	}
	1419
	1420	if ($checker->num_errors() < 0) { # Returns negative if not a pod
	1421	$checker->set_skip("$filename is not a pod");
	1422	}
	1423	else {
	1424
	1425	# Here, is a pod. See if it is one that has already been tested,
	1426	# or should be tested under another directory. Use either its NAME
	1427	# if it has one, or a checksum if not.
	1428	my $name = $checker->name;
	1429	my $id;
	1430
	1431	if ($name) {
	1432	$id = $name;
	1433	}
	1434	else {
	1435	my $digest = Digest->new($digest_type);
	1436	my $contents = extract_pod($filename);
	1437
	1438	# If the return is undef, it means that $filename was a transitory
	1439	# file; skip it.
	1440	next FILE unless defined $contents;
	1441	$digest->add($contents);
	1442	$id = $digest->digest;
	1443	}
	1444
	1445	# If there is a match for this pod with something that we've already
	1446	# processed, don't process it, and output why.
	1447	my $prior_checker;
	1448	if (defined ($prior_checker = $id_to_checker{$id})
	1449	&& $prior_checker != $checker) # Could have defined the checker
	1450	# earlier without pursuing it
	1451	{
	1452
	1453	# If the pods are identical, then it's just a copy, and isn't an
	1454	# error. First use the checksums we have already computed to see
	1455	# if the entire files are identical, which means that the pods are
	1456	# identical too.
	1457	my $prior_filename = $prior_checker->get_filename;
	1458	my $same = (! $name
	1459	\|\| ($digests{$prior_filename}
	1460	&& $digests{$filename}
	1461	&& $digests{$prior_filename} eq $digests{$filename}));
	1462
	1463	# If they differ, it could be that the files differ for some
	1464	# reason, but the pods they contain are identical. Extract the
	1465	# pods and do the comparisons on just those.
	1466	if (! $same && $name) {
	1467	my $contents = extract_pod($filename);
	1468
	1469	# If return is <undef>, it means that $filename no longer
	1470	# exists. This means it was a transitory file, and should not
	1471	# be tested.
	1472	next FILE unless defined $contents;
	1473
	1474	my $prior_contents = extract_pod($prior_filename);
	1475
	1476	# If return is <undef>, it means that $prior_filename no
	1477	# longer exists. This means it was a transitory file, and
	1478	# should not have been tested, but we already did process it.
	1479	# What we should do now is to back-out its records, and
	1480	# process $filename in its stead. But backing out is not so
	1481	# simple, and so I'm (khw) skipping that unless and until
	1482	# experience shows that it is needed. We do go process
	1483	# $filename, and there are potential false positive conflicts
	1484	# with the transitory $prior_contents, and rerunning the test
	1485	# should cause it to succeed.
	1486	goto process_this_pod unless defined $prior_contents;
	1487
	1488	$same = $prior_contents eq $contents;
	1489	}
	1490
	1491	if ($same) {
	1492	$checker->set_skip("The pod of $filename is a duplicate of "
	1493	. "the pod for $prior_filename");
	1494	} elsif ($prior_filename =~ /\breadme\b/i) {
	1495	$checker->set_skip("$prior_filename is a README apparently for $filename");
	1496	} elsif ($filename =~ /\breadme\b/i) {
	1497	$checker->set_skip("$filename is a README apparently for $prior_filename");
	1498	} elsif (! $do_upstream_cpan
	1499	&& $filename =~ /^cpan/
	1500	&& $prior_filename =~ /^cpan/)
	1501	{
	1502	$checker->set_skip("CPAN is upstream for $filename");
	1503	} elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) {
	1504	$checker->set_skip("$filename copy is in utils/");
	1505	} else { # Here have two pods with identical names that differ
	1506	$prior_checker->poderror(
	1507	{ -msg => $duplicate_name,
	1508	-line => "???",
	1509	parameter => "'$filename' also has NAME '$name'"
	1510	});
	1511	$checker->poderror(
	1512	{ -msg => $duplicate_name,
	1513	-line => "???",
	1514	parameter => "'$prior_filename' also has NAME '$name'"
	1515	});
	1516
	1517	# Changing the names helps later.
	1518	$prior_checker->name("$name version arbitrarily numbered 1");
	1519	$checker->name("$name version arbitrarily numbered 2");
	1520	}
	1521
	1522	# In any event, don't process this pod that has the same name as
	1523	# another.
	1524	next FILE;
	1525	}
	1526
	1527	process_this_pod:
	1528
	1529	# A unique pod.
	1530	$id_to_checker{$id} = $checker;
	1531
	1532	my $parsed_for_links = ", but parsed for its interior links";
	1533	if ((! $do_upstream_cpan && $filename =~ /^cpan/)
	1534	\|\| $filename =~ $only_for_interior_links_re)
	1535	{
	1536	if ($filename =~ /^cpan/) {
	1537	$checker->set_skip("CPAN is upstream for $filename");
	1538	}
	1539	elsif ($filename =~ /perl\d+delta/) {
	1540	if (! $do_deltas) {
	1541	$checker->set_skip("$filename is a stable perldelta");
	1542	}
	1543	}
	1544	elsif ($filename =~ /perltoc/) {
	1545	$checker->set_skip("$filename dependent on component pods");
	1546	}
	1547	else {
	1548	croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
	1549	}
	1550
	1551	if ($name && $has_referred_to_node{$name}) {
	1552	$checker->set_skip($checker->get_skip() . $parsed_for_links);
	1553	}
	1554	}
	1555
	1556	# Need a name in order to process it, because not meaningful
	1557	# otherwise, and also can't test links to this without a name.
	1558	if (!defined $name) {
	1559	$checker->poderror( { -msg => $no_name,
	1560	-line => '???'
	1561	});
	1562	next FILE;
	1563	}
	1564
	1565	# For skipped files, just get its NAME
	1566	my $skip;
	1567	if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
	1568	{
	1569	$checker->node($name) if $name;
	1570	}
	1571	elsif (! $parsed) {
	1572	if (! $checker->parse_from_file($filename, undef)) {
	1573	$checker->set_skip("$filename is transitory");
	1574	next FILE;
	1575	}
	1576	}
	1577
	1578	# Go through everything in the file that could be an anchor that
	1579	# could be a link target. Count how many there are of the same name.
	1580	foreach my $node ($checker->linkable_nodes) {
	1581	next FILE if ! $node; # Can be empty is like '=item *'
	1582	if (exists $nodes{$name}{$node}) {
	1583	$nodes{$name}{$node}++;
	1584	}
	1585	else {
	1586	$nodes{$name}{$node} = 1;
	1587	}
	1588
	1589	# Experiments have shown that cpan search can figure out the
	1590	# target of a link even if the exact wording is incorrect, as long
	1591	# as the first word is. This happens frequently in perlfunc.pod,
	1592	# where the link will be just to the function, but the target
	1593	# entry also includes parameters to the function.
	1594	my $first_word = $node;
	1595	if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
	1596	$nodes_first_word{$name}{$first_word} = $node;
	1597	}
	1598	}
	1599	$filename_to_pod{$filename} = $name;
	1600	}
	1601	}
	1602
	1603	# Here, all files have been parsed, and all links and link targets are stored.
	1604	# Now go through the files again and see which don't have matches.
	1605	if (! $has_input_files) {
	1606	foreach my $filename (@files) {
	1607	next if $filename_to_checker{$filename}->get_skip;
	1608	my $checker = $filename_to_checker{$filename};
	1609	foreach my $link ($checker->hyperlink) {
	1610	my $linked_to_page = $link->[1]->page;
	1611	next unless $linked_to_page; # intra-file checks are handled by std
	1612	# Pod::Checker
	1613
	1614	# Initialize the potential message.
	1615	my %problem = ( -msg => $broken_link,
	1616	-line => $link->[0],
	1617	parameter => "to \"$linked_to_page\"",
	1618	);
	1619
	1620	# See if we have found the linked-to_file in our parse
	1621	if (exists $nodes{$linked_to_page}) {
	1622	my $node = $link->[1]->node;
	1623
	1624	# If link is only to the page-level, already have it
	1625	next if ! $node;
	1626
	1627	# Transform pod language to what we are expecting
	1628	$node =~ s,E<sol>,/,g;
	1629	$node =~ s/E<verbar>/\|/g;
	1630
	1631	# If link is to a node that exists in the file, is ok
	1632	if ($nodes{$linked_to_page}{$node}) {
	1633
	1634	# But if the page has multiple targets with the same name,
	1635	# it's ambiguous which one this should be to.
	1636	if ($nodes{$linked_to_page}{$node} > 1) {
	1637	$problem{-msg} = $multiple_targets;
	1638	$problem{parameter} = "in $linked_to_page that $node could be pointing to";
	1639	$checker->poderror(\%problem);
	1640	}
	1641	} elsif (! $nodes_first_word{$linked_to_page}{$node}) {
	1642
	1643	# Here the link target was not found, either exactly or to
	1644	# the first word. Is an error.
	1645	$problem{parameter} =~ s,"$,/$node",;
	1646	$checker->poderror(\%problem);
	1647	}
	1648
	1649	} # Linked-to-file not in parse; maybe is in exception list
	1650	elsif (! exists $valid_modules{$link->[1]->page}) {
	1651
	1652	# Here, is a link to a target that we can't find. Check if
	1653	# there is an internal link on the page with the target name.
	1654	# If so, it could be that they just forgot the initial '/'
	1655	# But perldelta is handled specially: only do this if the
	1656	# broken link isn't one of the known bad ones (that are
	1657	# placemarkers and should be removed for the final)
	1658	my $NAME = $filename_to_pod{$filename};
	1659	if (! defined $NAME) {
	1660	$checker->poderror(\%problem);
	1661	}
	1662	else {
	1663	if ($nodes{$NAME}{$linked_to_page}) {
	1664	$problem{-msg} = $broken_internal_link;
	1665	}
	1666	$checker->poderror(\%problem);
	1667	}
	1668	}
	1669	}
	1670	}
	1671	}
	1672
	1673	# If regenerating the data file, start with the modules for which we don't
	1674	# check targets. If you change the sort order, you need to run --regen before
	1675	# committing so that future commits that do run regen don't show irrelevant
	1676	# changes.
	1677	if ($regen) {
	1678	foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
	1679	my_safer_print($copy_fh, $_, "\n");
	1680	}
	1681	}
	1682
	1683	# Now ready to output the messages.
	1684	foreach my $filename (@files) {
	1685	my $canonical = canonicalize($filename);
	1686	SKIP: {
	1687	my $skip = $filename_to_checker{$filename}->get_skip // "";
	1688
	1689	if ($regen) {
	1690	foreach my $message ( sort keys %{$problems{$filename}}) {
	1691	my $count;
	1692
	1693	# Preserve a negative setting.
	1694	if ($known_problems{$canonical}{$message}
	1695	&& $known_problems{$canonical}{$message} < 0)
	1696	{
	1697	$count = $known_problems{$canonical}{$message};
	1698	}
	1699	else {
	1700	$count = @{$problems{$filename}{$message}};
	1701	}
	1702	my_safer_print($copy_fh, $canonical . "\t$message\t$count\n");
	1703	}
	1704	next;
	1705	}
	1706
	1707	skip($skip, 1) if $skip;
	1708	my @diagnostics;
	1709	my $thankful_diagnostics = 0;
	1710	my $indent = ' ';
	1711
	1712	my $total_known = 0;
	1713	foreach my $message ( sort keys %{$problems{$filename}}) {
	1714	$known_problems{$canonical}{$message} = 0
	1715	if ! $known_problems{$canonical}{$message};
	1716	my $diagnostic = "";
	1717	my $problem_count = scalar @{$problems{$filename}{$message}};
	1718	$total_known += $problem_count;
	1719	next if $known_problems{$canonical}{$message} < 0;
	1720	if ($problem_count > $known_problems{$canonical}{$message}) {
	1721
	1722	# Here we are about to output all the messages for this type,
	1723	# subtract back this number we previously added in.
	1724	$total_known -= $problem_count;
	1725
	1726	$diagnostic .= $indent . qq{"$message"};
	1727	if ($problem_count > 2) {
	1728	$diagnostic .= " ($problem_count occurrences,"
	1729	. " expected $known_problems{$canonical}{$message})";
	1730	}
	1731	foreach my $problem (@{$problems{$filename}{$message}}) {
	1732	$diagnostic .= " " if $problem_count == 1;
	1733	$diagnostic .= "\n$indent$indent";
	1734	$diagnostic .= "$problem->{parameter}" if $problem->{parameter};
	1735	$diagnostic .= " near line $problem->{-line}";
	1736	$diagnostic .= " $problem->{comment}" if $problem->{comment};
	1737	}
	1738	$diagnostic .= "\n";
	1739	$files_with_unknown_issues{$filename} = 1;
	1740	} elsif ($problem_count < $known_problems{$canonical}{$message}) {
	1741	$diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
	1742	$thankful_diagnostics++;
	1743	}
	1744	push @diagnostics, $diagnostic if $diagnostic;
	1745	}
	1746
	1747	# The above loop has output messages where there are current potential
	1748	# issues. But it misses where there were some that have been entirely
	1749	# fixed. For those, we need to look through the old issues
	1750	foreach my $message ( sort keys %{$known_problems{$canonical}}) {
	1751	next if $problems{$filename}{$message};
	1752	next if ! $known_problems{$canonical}{$message};
	1753	next if $known_problems{$canonical}{$message} < 0; # Preserve negs
	1754	my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
	1755	push @diagnostics, $diagnostic if $diagnostic;
	1756	$thankful_diagnostics++ if $diagnostic;
	1757	}
	1758
	1759	my $output = "POD of $filename";
	1760	$output .= ", excluding $total_known not shown known potential problems"
	1761	if $total_known;
	1762	if (@diagnostics && @diagnostics == $thankful_diagnostics) {
	1763	# Output fixed issues as passing to-do tests, so they do not
	1764	# cause failures, but t/harness still flags them.
	1765	$output .= " # TODO"
	1766	}
	1767	ok(@diagnostics == $thankful_diagnostics, $output);
	1768	if (@diagnostics) {
	1769	note(join "", @diagnostics,
	1770	"See end of this test output for your options on silencing this");
	1771	}
	1772
	1773	delete $known_problems{$canonical};
	1774	}
	1775	}
	1776
	1777	if (! $regen
	1778	&& ! ok (keys %known_problems == 0, "The known problems data base includes no references to non-existent files"))
	1779	{
	1780	note("The following files were not found: "
	1781	. join ", ", keys %known_problems);
	1782	note("They will automatically be removed from the db the next time");
	1783	note(" cd t; ./perl -I../lib porting/podcheck.t --regen");
	1784	note("is run");
	1785	}
	1786
	1787	my $how_to = <<EOF;
	1788	run this test script by hand, using the following formula (on
	1789	Un*x-like machines):
	1790	cd t
	1791	./perl -I../lib porting/podcheck.t --regen
	1792	EOF
	1793
	1794	if (%files_with_unknown_issues) {
	1795	my $were_count_files = scalar keys %files_with_unknown_issues;
	1796	$were_count_files = ($were_count_files == 1)
	1797	? "was $were_count_files file"
	1798	: "were $were_count_files files";
	1799	my $message = <<EOF;
	1800
	1801	HOW TO GET THIS .t TO PASS
	1802
	1803	There $were_count_files that had new potential problems identified.
	1804	Some of them may be real, and some of them may be false positives because
	1805	this program isn't as smart as it likes to think it is. You can teach this
	1806	program to ignore the issues it has identified, and hence pass, by doing the
	1807	following:
	1808
	1809	1) If a problem is about a link to an unknown module or man page that
	1810	you know exists, re-run the command something like:
	1811	./perl -I../lib porting/podcheck.t --add_link MODULE man_page ...
	1812	(MODULEs should look like Foo::Bar, and man_pages should look like
	1813	bar(3c); don't do this for a module or man page that you aren't sure
	1814	about; instead treat as another type of issue and follow the
	1815	instructions below.)
	1816
	1817	2) For other issues, decide if each should be fixed now or not. Fix the
	1818	ones you decided to, and rerun this test to verify that the fixes
	1819	worked.
	1820
	1821	3) If there remain false positive or problems that you don't plan to fix right
	1822	now,
	1823	$how_to
	1824	That should cause all current potential problems to be accepted by
	1825	the program, so that the next time it runs, they won't be flagged.
	1826	EOF
	1827	if (%files_with_fixes) {
	1828	$message .= " This step will also take care of the files that have fixes in them\n";
	1829	}
	1830
	1831	$message .= <<EOF;
	1832	For a few files, such as perltoc, certain issues will always be
	1833	expected, and more of the same will be added over time. For those,
	1834	before you do the regen, you can edit
	1835	$known_issues
	1836	and find the entry for the module's file and specific error message,
	1837	and change the count of known potential problems to -1.
	1838	EOF
	1839
	1840	note($message);
	1841	} elsif (%files_with_fixes) {
	1842	note(<<EOF
	1843	To teach this test script that the potential problems have been fixed,
	1844	$how_to
	1845	EOF
	1846	);
	1847	}
	1848
	1849	if ($regen) {
	1850	chdir $original_dir \|\| die "Can't change directories to $original_dir";
	1851	close_and_rename($copy_fh);
	1852	}