perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	#!/usr/bin/perl -w
	2
	3	package main;
	4
	5	BEGIN {
	6	chdir 't';
	7	@INC = "../lib";
	8	# Do not require test.pl, this file has its own framework.
	9	}
	10
	11	use strict;
	12	use warnings;
	13	use feature 'unicode_strings';
	14
	15	use Carp;
	16	use Config;
	17	use Digest;
	18	use File::Find;
	19	use File::Spec;
	20	use Scalar::Util;
	21	use Text::Tabs;
	22
	23	BEGIN {
	24	if ( $Config{usecrosscompile} ) {
	25	print "1..0 # Not all files are available during cross-compilation\n";
	26	exit 0;
	27	}
	28	if ($^O eq 'dec_osf') {
	29	print "1..0 # $^O cannot handle this test\n";
	30	exit(0);
	31	}
	32	if ( $ENV{'PERL_BUILD_PACKAGING'} ) {
	33	print "1..0 # This distro may have modified some files in cpan/. Skipping validation. \n";
	34	exit 0;
	35	}
	36	require '../regen/regen_lib.pl';
	37	}
	38
	39	sub DEBUG { 0 };
	40
	41	=pod
	42
	43	=head1 NAME
	44
	45	podcheck.t - Look for possible problems in the Perl pods
	46
	47	=head1 SYNOPSIS
	48
	49	cd t
	50	./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--deltas]
	51	[--counts] [--pedantic] [FILE ...]
	52
	53	./perl -I../lib porting/podcheck.t --add_link MODULE ...
	54
	55	./perl -I../lib porting/podcheck.t --regen
	56
	57	=head1 DESCRIPTION
	58
	59	podcheck.t is an extension of Pod::Checker. It looks for pod errors and
	60	potential errors in the files given as arguments, or if none specified, in all
	61	pods in the distribution workspace, except certain known special ones
	62	(specified below). It does additional checking beyond that done by
	63	Pod::Checker, and keeps a database of known potential problems, and will
	64	fail a pod only if the number of such problems differs from that given in the
	65	database.
	66
	67	The additional checks it always makes are:
	68
	69	=over
	70
	71	=item Cross-pod link checking
	72
	73	Pod::Checker verifies that links to an internal target in a pod are not
	74	broken. podcheck.t extends that (when called without FILE arguments) to
	75	external links. It does this by gathering up all the possible targets in the
	76	workspace, and cross-checking them. It also checks that a non-broken link
	77	points to just one target. (The destination pod could have two targets with
	78	the same name.)
	79
	80	The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod)
	81	is to actually create a link to C<search.cpan.org> with an embedded query for
	82	the desired pod or man page. That means that links outside the distribution
	83	are valid. podcheck.t doesn't verify the validity of such links, but instead
	84	keeps a database of those known to be valid. This means that if a link to a
	85	target not on the list is created, the target needs to be added to the data
	86	base. This is accomplished via the L<--add_link\|/--add_link MODULE ...>
	87	option to podcheck.t, described below.
	88
	89	=item An internal link that isn't so specified
	90
	91	If a link is broken, but there is an existing internal target of the same
	92	name, it is likely that the internal target was meant, and the C<"/"> is
	93	missing from the C<LE<lt>E<gt>> pod command.
	94
	95	=item Missing or duplicate NAME or missing NAME short description
	96
	97	A pod can't be linked to unless it has a unique name.
	98	And a NAME should have a dash and short description after it.
	99
	100	=item Occurrences of the Unicode replacement character
	101
	102	L<Pod::Simple> replaces bytes that aren't valid according to the document's
	103	encoding (declared or auto-detected) with C<\N{REPLACEMENT CHARACTER}>.
	104
	105	=back
	106
	107	If the C<PERL_POD_PEDANTIC> environment variable is set or the C<--pedantic>
	108	command line argument is provided, then a few more checks are made.
	109	The pedantic checks are:
	110
	111	=over
	112
	113	=item Verbatim paragraphs that wrap in an 80 (including 1 spare) column window
	114
	115	It's annoying to have lines wrap when displaying pod documentation in a
	116	terminal window. This checks that all verbatim lines fit in a standard 80
	117	column window, even when using a pager that reserves a column for its own use.
	118	(Thus the check is for a net of 79 columns.)
	119	For those lines that don't fit, it tells you how much needs to be cut in
	120	order to fit.
	121
	122	Often, the easiest thing to do to gain space for these is to lower the indent
	123	to just one space.
	124
	125	=item Items that perhaps should be links
	126
	127	There are mentions of apparent files in the pods that perhaps should be links
	128	instead, using C<LE<lt>...E<gt>>
	129
	130	=item Items that perhaps should be C<FE<lt>...E<gt>>
	131
	132	What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
	133	C<FE<lt>...E<gt>> mark-up instead.
	134
	135	=back
	136
	137	A number of issues raised by podcheck.t and by the base Pod::Checker are not
	138	really problems, but merely potential problems, that is, false positives.
	139	After inspecting them and
	140	deciding that they aren't real problems, it is possible to shut up this program
	141	about them, unlike base Pod::Checker. For a valid link to an outside module
	142	or man page, call podcheck.t with the C<--add_link> option to add it to the
	143	the database of known links; for other causes, call podcheck.t with the C<--regen>
	144	option to regenerate the entire database. This tells it that all existing
	145	issues are to not be mentioned again.
	146
	147	C<--regen> isn't fool-proof. The database merely keeps track of the number of these
	148	potential problems of each type for each pod. If a new problem of a given
	149	type is introduced into the pod, podcheck.t will spit out all of them. You
	150	then have to figure out which is the new one, and should it be changed or not.
	151	But doing it this way insulates the database from having to keep track of line
	152	numbers of problems, which may change, or the exact wording of each problem
	153	which might also change without affecting whether it is a problem or not.
	154
	155	Also, if the count of potential problems of a given type for a pod decreases,
	156	the database must be regenerated so that it knows the new number. The program
	157	gives instructions when this happens.
	158
	159	Some pods will have varying numbers of problems of a given type. This can
	160	be handled by manually editing the database file (see L</FILES>), and setting
	161	the number of those problems for that pod to a negative number. This will
	162	cause the corresponding error to always be suppressed no matter how many there
	163	actually are.
	164
	165	Another problem is that there is currently no check that modules listed as
	166	valid in the database
	167	actually are. Thus any errors introduced there will remain there.
	168
	169	=head2 Specially handled pods
	170
	171	=over
	172
	173	=item perltoc
	174
	175	This pod is generated by pasting bits from other pods. Errors in those bits
	176	will show up as errors here, as well as for those other pods. Therefore
	177	errors here are suppressed, and the pod is checked only to verify that nodes
	178	within it actually exist that are externally linked to.
	179
	180	=item perldelta
	181
	182	The current perldelta pod is initialized from a template that contains
	183	placeholder text. Some of this text is in the form of links that don't really
	184	exist. Any such links that are listed in C<@perldelta_ignore_links> will not
	185	generate messages. It is presumed that these links will be cleaned up when
	186	the perldelta is cleaned up for release since they should be marked with
	187	C<XXX>.
	188
	189	=item Porting/perldelta_template.pod
	190
	191	This is not a pod, but a template for C<perldelta>. Any errors introduced
	192	here will show up when C<perldelta> is created from it.
	193
	194	=item cpan-upstream pods
	195
	196	See the L</--cpan> option documentation
	197
	198	=item old perldeltas
	199
	200	See the L</--deltas> option documentation
	201
	202	=back
	203
	204	=head1 OPTIONS
	205
	206	=over
	207
	208	=item --add_link MODULE ...
	209
	210	Use this option to teach podcheck.t that the C<MODULE>s or man pages actually
	211	exist, and to silence any messages that links to them are broken.
	212
	213	podcheck.t checks that links within the Perl core distribution are valid, but
	214	it doesn't check links to man pages or external modules. When it finds
	215	a broken link, it checks its database of external modules and man pages,
	216	and only if not found there does it raise a message. This option just adds
	217	the list of modules and man page references that follow it on the command line
	218	to that database.
	219
	220	For example,
	221
	222	cd t
	223	./perl -I../lib porting/podcheck.t --add_link Unicode::Casing
	224
	225	causes the external module "Unicode::Casing" to be added to the database, so
	226	C<LE<lt>Unicode::CasingE<gt>> will be considered valid.
	227
	228	=item --regen
	229
	230	Regenerate the database used by podcheck.t to include all the existing
	231	potential problems. Future runs of the program will not then flag any of
	232	these. Setting this option also sets C<--pedantic>.
	233
	234	=item --cpan
	235
	236	Normally, all pods in the cpan directory are skipped, except to make sure that
	237	any blead-upstream links to such pods are valid.
	238	This option will cause cpan upstream pods to be fully checked.
	239
	240	=item --deltas
	241
	242	Normally, all old perldelta pods are skipped, except to make sure that
	243	any links to such pods are valid. This is because they are considered
	244	stable, and perhaps trying to fix them will cause changes that will
	245	misrepresent Perl's history. But, this option will cause them to be fully
	246	checked.
	247
	248	=item --show_all
	249
	250	Normally, if the number of potential problems of a given type found for a
	251	pod matches the expected value in the database, they will not be displayed.
	252	This option forces the database to be ignored during the run, so all potential
	253	problems are displayed and will fail their respective pod test. Specifying
	254	any particular FILES to operate on automatically selects this option.
	255
	256	=item --counts
	257
	258	Instead of testing, this just dumps the counts of the occurrences of the
	259	various types of potential problems in the database.
	260
	261	=item --pedantic
	262
	263	There are three potential problems that are not checked for by default.
	264	This options enables them. The environment variable C<PERL_POD_PEDANTIC>
	265	can be set to 1 to enable this option also.
	266	This option is set when C<--regen> is used.
	267
	268	=back
	269
	270	=head1 FILES
	271
	272	The database is stored in F<t/porting/known_pod_issues.dat>
	273
	274	=head1 SEE ALSO
	275
	276	L<Pod::Checker>
	277
	278	=cut
	279
	280	# VMS builds have a '.com' appended to utility and script names, and it adds a
	281	# trailing dot for any other file name that doesn't have a dot in it. The db
	282	# is stored without those things. This regex allows for these special file
	283	# names to be dealt with. It needs to be interpolated into a larger regex
	284	# that furnishes the closing boundary.
	285	my $vms_re = qr/ \. (?: com )? /x;
	286
	287	# Some filenames in the MANIFEST match $vms_re, and so must not be handled the
	288	# same way that that the special vms ones are. This hash lists those.
	289	my %special_vms_files;
	290
	291	# This is to get this to work across multiple file systems, including those
	292	# that are not case sensitive. The db is stored in lower case, Un*x style,
	293	# and all file name comparisons are done that way.
	294	sub canonicalize($) {
	295	my $input = shift;
	296	my ($volume, $directories, $file)
	297	= File::Spec->splitpath(File::Spec->canonpath($input));
	298	# Assumes $volume is constant for everything in this directory structure
	299	$directories = "" if ! $directories;
	300	$file = "" if ! $file;
	301	$file = lc join '/', File::Spec->splitdir($directories), $file;
	302	$file =~ s! / /+ !/!gx; # Multiple slashes => single slash
	303
	304	# The db is stored without the special suffixes that are there in VMS, so
	305	# strip them off to get the comparable name. But some files on all
	306	# platforms have these suffixes, so this shouldn't happen for them, as any
	307	# of their db entries will have the suffixes in them. The hash has been
	308	# populated with these files.
	309	if ($^O eq 'VMS'
	310	&& $file =~ / ( $vms_re ) $ /x
	311	&& ! exists $special_vms_files{$file})
	312	{
	313	$file =~ s/ $1 $ //x;
	314	}
	315	return $file;
	316	}
	317
	318	#####################################################
	319	# HOW IT WORKS (in general)
	320	#
	321	# If not called with specific files to check, the directory structure is
	322	# examined for files that have pods in them. Files that might not have to be
	323	# fully parsed (e.g. in cpan) are parsed enough at this time to find their
	324	# pod's NAME, and to get a checksum.
	325	#
	326	# Those kinds of files are sorted last, but otherwise the pods are parsed with
	327	# the package coded here, My::Pod::Checker, which is an extension to
	328	# Pod::Checker that adds some tests and suppresses others that aren't
	329	# appropriate. The latter module has no provision for capturing diagnostics,
	330	# so a package, Tie_Array_to_FH, is used to force them to be placed into an
	331	# array instead of printed.
	332	#
	333	# Parsing the files builds up a list of links. The files are gone through
	334	# again, doing cross-link checking and outputting all saved-up problems with
	335	# each pod.
	336	#
	337	# Sorting the files last that potentially don't need to be fully parsed allows
	338	# us to not parse them unless there is a link to an internal anchor in them
	339	# from something that we have already parsed. Keeping checksums allows us to
	340	# not parse copies of other pods.
	341	#
	342	#####################################################
	343
	344	# 1 => Exclude low priority messages that aren't likely to be problems, and
	345	# has many false positives; higher numbers give more messages.
	346	my $Warnings_Level = 200;
	347
	348	# perldelta during construction may have place holder links. N.B. This
	349	# variable is referred to by name in release_managers_guide.pod
	350	our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" );
	351
	352	# To see if two pods with the same NAME are actually copies of the same pod,
	353	# which is not an error, it uses a checksum to save work.
	354	my $digest_type = "SHA-1";
	355
	356	my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
	357	my $data_dir = File::Spec->catdir($original_dir, 'porting');
	358	my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
	359	my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_dir), 'MANIFEST');
	360	my $copy_fh;
	361
	362	my $MAX_LINE_LENGTH = 79; # 79 columns
	363	my $INDENT = 7; # default nroff indent
	364
	365	# Our warning messages. Better not have [('"] in them, as those are used as
	366	# delimiters for variable parts of the messages by poderror.
	367	my $broken_link = "Apparent broken link";
	368	my $broken_internal_link = "Apparent internal link is missing its forward slash";
	369	my $multiple_targets = "There is more than one target";
	370	my $duplicate_name = "Pod NAME already used";
	371	my $no_name = "There is no NAME";
	372	my $missing_name_description = "The NAME should have a dash and short description after it";
	373	my $replacement_character = "Unicode replacement character found";
	374	# the pedantic warnings messages
	375	my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
	376	my $C_not_linked = "? Should you be using L<...> instead of";
	377	my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
	378
	379	# objects, tests, etc can't be pods, so don't look for them. Also skip
	380	# files output by the patch program. Could also ignore most of .gitignore
	381	# files, but not all, so don't.
	382
	383	my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back
	384	my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d;
	385	my $lib_so = $Config{'so'}; $lib_so =~ tr/.//d;
	386	my $dl_ext = $Config{'dlext'}; $dl_ext =~ tr/.//d;
	387
	388	# Not really pods, but can look like them.
	389	my %excluded_files = (
	390	canonicalize("lib/unicore/mktables") => 1,
	391	canonicalize("Porting/make-rmg-checklist") => 1,
	392	canonicalize("Porting/perldelta_template.pod") => 1,
	393	canonicalize("regen/feature.pl") => 1,
	394	canonicalize("regen/warnings.pl") => 1,
	395	canonicalize("autodoc.pl") => 1,
	396	canonicalize("configpm") => 1,
	397	canonicalize("miniperl") => 1,
	398	canonicalize("perl") => 1,
	399	canonicalize('cpan/Pod-Perldoc/corpus/no-head.pod') => 1,
	400	canonicalize('cpan/Pod-Perldoc/corpus/perlfunc.pod') => 1,
	401	canonicalize('cpan/Pod-Perldoc/corpus/utf8.pod') => 1,
	402	canonicalize("lib/unicore/mktables") => 1,
	403	);
	404
	405	# This list should not include anything for which case sensitivity is
	406	# important, as it won't work on VMS, and won't show up until tested on VMS.
	407	# All or almost all such files should be listed in the MANIFEST, so that can
	408	# be examined for them, and each such file explicitly excluded, as is done for
	409	# .PL files in the loop just below this. For files not catchable this way,
	410	# is_pod_file() can be used to exclude these at a finer grained level.
	411	my $non_pods = qr/ (?: \.
	412	(?: [achot] \| zip \| gz \| bz2 \| jar \| tar \| tgz
	413	\| orig \| rej \| patch # Patch program output
	414	\| sw[op] \| \#.* # Editor droppings
	415	\| old # buildtoc output
	416	\| xs # pod should be in the .pm file
	417	\| al # autosplit files
	418	\| bs # bootstrap files
	419	\| (?i:sh) # shell scripts, hints, templates
	420	\| lst # assorted listing files
	421	\| bat # Windows,Netware,OS2 batch files
	422	\| cmd # Windows,Netware,OS2 command files
	423	\| lis # VMS compiler listings
	424	\| map # VMS linker maps
	425	\| opt # VMS linker options files
	426	\| mms # MM(K\|S) description files
	427	\| ts # timestamp files generated during build
	428	\| $obj_ext # object files
	429	\| exe # $Config{'exe_ext'} might be empty string
	430	\| $lib_ext # object libraries
	431	\| $lib_so # shared libraries
	432	\| $dl_ext # dynamic libraries
	433	\| gif # GIF images (example files from CGI.pm)
	434	\| eg # examples from libnet
	435	\| core .*
	436	)
	437	$
	438	) \| ~$ \| \ $Autosaved$\.txt$ # Other editor droppings
	439	\| ^cxx\$demangler_db\.$ # VMS name mangler database
	440	\| ^typemap\.?$ # typemap files
	441	\| ^(?i:Makefile\.PL)$
	442	\| ^core (?: $ \| \. .* )
	443	/x;
	444
	445	# Matches something that looks like a file name, but is enclosed in C<...>
	446	my $C_path_re = qr{ ^
	447	# exclude various things that have slashes
	448	# in them but aren't paths
	449	(?!
	450	(?: (?: s \| qr \| m \| tr \| y ) / ) # regexes
	451	\| \d+/\d+ \b # probable fractions
	452	\| (?: [LF] < )+
	453	\| OS/2 \b
	454	\| Perl/Tk \b
	455	\| origin/blead \b
	456	\| origin/maint \b
	457
	458	)
	459	/? # Optional initial slash
	460	\w+ # First component of path, doesn't begin with
	461	# a minus
	462	(?: / [-\w]+ )+ # Subsequent path components
	463	(?: \. \w+ )? # Optional trailing dot and suffix
	464	>* # Any enclosed L< F< have matching closing >
	465	$
	466	}x;
	467
	468	# '.PL' files should be excluded, as they aren't final pods, but often contain
	469	# material used in generating pods, and so can look like a pod. We can't use
	470	# the regexp above because case sensitivity is important for these, as some
	471	# '.pl' files should be examined for pods. Instead look through the MANIFEST
	472	# for .PL files and get their full path names, so we can exclude each such
	473	# file explicitly. This works because other porting tests prohibit having two
	474	# files with the same names except for case.
	475	open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST";
	476	while (<$manifest_fh>) {
	477
	478	# While we have MANIFEST open, on VMS platforms, look for files that match
	479	# the magic VMS file names that have to be handled specially. Add these
	480	# to the list of them.
	481	if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) {
	482	$special_vms_files{$1} = 1;
	483	}
	484	if (/ ^ ( [^\t]* \. PL ) \t /x) {
	485	$excluded_files{canonicalize($1)} = 1;
	486	}
	487	}
	488	close $manifest_fh, or die "Can't close $MANIFEST";
	489
	490
	491	# Pod::Checker messages to suppress
	492	my @suppressed_messages = (
	493	# We catch independently the ones that are real problems.
	494	qr/multiple occurrences $\d+$ of link target/,
	495
	496	"unescaped <>", # Not every '<' or '>' need be escaped
	497	qr/No items in =over/, # i.e., a blockquote, which we consider legal
	498	);
	499
	500	sub suppressed {
	501	# Returns bool as to if input message is one that is to be suppressed
	502
	503	my $message = shift;
	504
	505	return grep { $message =~ /^$_/i } @suppressed_messages;
	506	}
	507
	508	{ # Closure to contain a simple subset of test.pl. This is to get rid of the
	509	# unnecessary 'failed at' messages that would otherwise be output pointing
	510	# to a particular line in this file.
	511
	512	my $current_test = 0;
	513	my $planned;
	514
	515	sub plan {
	516	my %plan = @_;
	517	$planned = $plan{tests} + 1; # +1 for final test that files haven't
	518	# been removed
	519	print "1..$planned\n";
	520	return;
	521	}
	522
	523	sub ok {
	524	my $success = shift;
	525	my $message = shift;
	526
	527	chomp $message;
	528
	529	$current_test++;
	530	print "not " unless $success;
	531	print "ok $current_test - $message\n";
	532	return $success;
	533	}
	534
	535	sub skip {
	536	my $why = shift;
	537	my $n = @_ ? shift : 1;
	538	for (1..$n) {
	539	$current_test++;
	540	print "ok $current_test # skip $why\n";
	541	}
	542	no warnings 'exiting';
	543	last SKIP;
	544	}
	545
	546	sub _note {
	547	my ($andle, $message) = @_;
	548
	549	chomp $message;
	550
	551	print $andle $message =~ s/^/# /mgr;
	552	print $andle "\n";
	553	return;
	554	}
	555
	556	sub note { unshift @_, \*STDOUT; goto &_note }
	557
	558	sub diag { unshift @_, \*STDERR; goto &_note }
	559
	560	END {
	561	if ($planned && $planned != $current_test) {
	562	print STDERR
	563	"# Looks like you planned $planned tests but ran $current_test.\n";
	564	}
	565	}
	566	}
	567
	568	# List of known potential problems by pod and type.
	569	my %known_problems;
	570
	571	# Pods given by the keys contain an interior node that is referred to from
	572	# outside it.
	573	my %has_referred_to_node;
	574
	575	my $show_counts = 0;
	576	my $regen = 0;
	577	my $add_link = 0;
	578	my $show_all = 0;
	579	my $pedantic = 0;
	580
	581	my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan
	582	my $do_deltas = 0; # And stable perldeltas
	583
	584	while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
	585	my $arg = shift @ARGV;
	586
	587	$arg =~ s/^--/-/; # Treat '--' the same as a single '-'
	588	if ($arg eq '-regen') {
	589	$regen = 1;
	590	$pedantic = 1;
	591	}
	592	elsif ($arg eq '-add_link') {
	593	$add_link = 1;
	594	}
	595	elsif ($arg eq '-cpan') {
	596	$do_upstream_cpan = 1;
	597	}
	598	elsif ($arg eq '-deltas') {
	599	$do_deltas = 1;
	600	}
	601	elsif ($arg eq '-show_all') {
	602	$show_all = 1;
	603	}
	604	elsif ($arg eq '-counts') {
	605	$show_counts = 1;
	606	}
	607	elsif ($arg eq '-pedantic') {
	608	$pedantic = 1;
	609	}
	610	else {
	611	die <<EOF;
	612	Unknown option '$arg'
	613
	614	Usage: $0 [ --regen \| --cpan \| --show_all \| FILE ... \| --add_link MODULE ... ]\n"
	615	--add_link -> Add the MODULE and man page references to the database
	616	--regen -> Regenerate the data file for $0
	617	--cpan -> Include files in the cpan subdirectory.
	618	--deltas -> Include stable perldeltas
	619	--show_all -> Show all known potential problems
	620	--counts -> Don't test, but give summary counts of the currently
	621	existing database
	622	--pedantic -> Check for overly long lines in verbatim blocks
	623	EOF
	624	}
	625	}
	626
	627	$pedantic = 1 if exists $ENV{PERL_POD_PEDANTIC} and $ENV{PERL_POD_PEDANTIC};
	628	my @files = @ARGV;
	629
	630	my $cpan_or_deltas = $do_upstream_cpan \|\| $do_deltas;
	631	if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) {
	632	croak "--regen, --show_all, --counts, and --add_link are mutually exclusive\n and none can be run with --cpan nor --deltas";
	633	}
	634
	635	my $has_input_files = @files;
	636
	637
	638	if ($add_link) {
	639	if (! $has_input_files) {
	640	croak "--add_link requires at least one module or man page reference";
	641	}
	642	}
	643	elsif ($has_input_files) {
	644	if ($regen \|\| $show_counts \|\| $do_upstream_cpan \|\| $do_deltas) {
	645	croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files";
	646	}
	647	foreach my $file (@files) {
	648	croak "Can't read file '$file'" if ! -r $file;
	649	}
	650	}
	651
	652	our %problems; # potential problems found in this run
	653
	654	package My::Pod::Checker { # Extend Pod::Checker
	655	use parent 'Pod::Checker';
	656
	657	# Uses inside out hash to protect from typos
	658	# For new fields, remember to add to destructor DESTROY()
	659	my %CFL_text; # The text comprising the current C<>, F<>, or L<>
	660	my %C_text; # If defined, are in a C<> section, and includes
	661	# the accumulated text from that
	662	my %current_indent; # Current line's indent
	663	my %filename; # The pod is store in this file
	664	my %in_CFL; # count of stacked C<>, F<>, L<> directives
	665	my %indents; # Stack of indents from =over's in effect for
	666	# current line
	667	my %in_for; # true if in a =for or =begin
	668	my %in_NAME; # true if within NAME section
	669	my %in_begin; # true if within =begin section
	670	my %in_X; # true if in a X<>
	671	my %linkable_item; # Bool: if the latest =item is linkable. It isn't
	672	# for bullet and number lists
	673	my %linkable_nodes; # Pod::Checker adds all =items to its node list,
	674	# but not all =items are linkable-to
	675	my %running_CFL_text; # The current text that is being accumulated until
	676	# an end_FOO is found, and this includes any C<>,
	677	# F<>, or L<> directives.
	678	my %running_simple_text; # The currentt text that is being accumulated
	679	# until an end_FOO is found, and all directives
	680	# have been expanded into plain text
	681	my %command_count; # Number of commands seen
	682	my %seen_pod_cmd; # true if have =pod earlier
	683	my %skip; # is SKIP set for this pod
	684	my %start_line; # the first input line number in the the thing
	685	# currently being worked on
	686
	687	sub DESTROY {
	688	my $addr = Scalar::Util::refaddr $_[0];
	689	delete $CFL_text{$addr};
	690	delete $C_text{$addr};
	691	delete $command_count{$addr};
	692	delete $current_indent{$addr};
	693	delete $filename{$addr};
	694	delete $in_begin{$addr};
	695	delete $in_CFL{$addr};
	696	delete $indents{$addr};
	697	delete $in_for{$addr};
	698	delete $in_NAME{$addr};
	699	delete $in_X{$addr};
	700	delete $linkable_item{$addr};
	701	delete $linkable_nodes{$addr};
	702	delete $running_CFL_text{$addr};
	703	delete $running_simple_text{$addr};
	704	delete $seen_pod_cmd{$addr};
	705	delete $skip{$addr};
	706	delete $start_line{$addr};
	707	return;
	708	}
	709
	710	sub new {
	711	my $class = shift;
	712	my $filename = shift;
	713
	714	my $self = $class->SUPER::new(-quiet => 1,
	715	-warnings => $Warnings_Level);
	716	my $addr = Scalar::Util::refaddr $self;
	717	$command_count{$addr} = 0;
	718	$current_indent{$addr} = 0;
	719	$filename{$addr} = $filename;
	720	$in_begin{$addr} = 0;
	721	$in_X{$addr} = 0;
	722	$in_CFL{$addr} = 0;
	723	$in_NAME{$addr} = 0;
	724	$linkable_item{$addr} = 0;
	725	$seen_pod_cmd{$addr} = 0;
	726	return $self;
	727	}
	728
	729	# re's for messages that Pod::Checker outputs
	730	my $location = qr/ \b (?:in\|at\|on\|near) \s+ /xi;
	731	my $optional_location = qr/ (?: $location )? /xi;
	732	my $line_reference = qr/ [('"]? $optional_location \b line \s+
	733	(?: \d+ \| EOF \| \Q???\E \| - )
	734	[)'"]? /xi;
	735
	736	sub poderror { # Called to register a potential problem
	737
	738	# This adds an extra field to the parent hash, 'parameter'. It is
	739	# used to extract the variable parts of a message leaving just the
	740	# constant skeleton. This in turn allows the message to be
	741	# categorized better, so that it shows up as a single type in our
	742	# database, with the specifics of each occurrence not being stored with
	743	# it.
	744
	745	my $self = shift;
	746	my $opts = shift;
	747
	748	my $addr = Scalar::Util::refaddr $self;
	749	return if $skip{$addr};
	750
	751	# Input can be a string or hash. If a string, parse it to separate
	752	# out the line number and convert to a hash for easier further
	753	# processing
	754	my $message;
	755	if (ref $opts ne 'HASH') {
	756	$message = join "", $opts, @_;
	757	my $line_number;
	758	if ($message =~ s/\s*($line_reference)//) {
	759	($line_number = $1) =~ s/\s*$optional_location//;
	760	}
	761	else {
	762	$line_number = '???';
	763	}
	764	$opts = { -msg => $message, -line => $line_number };
	765	} else {
	766	$message = $opts->{'-msg'};
	767
	768	}
	769
	770	$message =~ s/^\d+\s+//;
	771	return if main::suppressed($message);
	772
	773	$self->SUPER::poderror($opts, @_);
	774
	775	$opts->{parameter} = "" unless $opts->{parameter};
	776
	777	# The variable parts of the message tend to be enclosed in '...',
	778	# "....", or (...). Extract them and put them in an extra field,
	779	# 'parameter'. This is trickier because the matching delimiter to a
	780	# '(' is its mirror, and not itself. Text::Balanced could be used
	781	# instead.
	782	while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
	783	my $delimiter = $1;
	784	my $start = $-[0];
	785	$delimiter = ')' if $delimiter eq '(';
	786
	787	# If there is no ending delimiter, don't consider it to be a
	788	# variable part. Most likely it is a contraction like "Don't"
	789	last unless $message =~ m/\G .+? \Q$delimiter/xg;
	790
	791	my $length = $+[0] - $start;
	792
	793	# Get the part up through the closing delimiter
	794	my $special = substr($message, $start, $length);
	795	$special =~ s/^\s+//; # No leading whitespace
	796
	797	# And add that variable part to the parameter, while removing it
	798	# from the message. This isn't a foolproof way of finding the
	799	# variable part. For example '(s)' can occur in e.g.,
	800	# 'paragraph(s)'
	801	if ($special ne '(s)') {
	802	substr($message, $start, $length) = "";
	803	pos $message = $start;
	804	$opts->{-msg} = $message;
	805	$opts->{parameter} .= " " if $opts->{parameter};
	806	$opts->{parameter} .= $special;
	807	}
	808	}
	809
	810	# Extract any additional line number given. This is often the
	811	# beginning location of something whereas the main line number gives
	812	# the ending one.
	813	if ($message =~ /( $line_reference )/xi) {
	814	my $line_ref = $1;
	815	while ($message =~ s/\s*\Q$line_ref//) {
	816	$opts->{-msg} = $message;
	817	$opts->{parameter} .= " " if $opts->{parameter};
	818	$opts->{parameter} .= $line_ref;
	819	}
	820	}
	821
	822	Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
	823	push @{$problems{$filename{$addr}}{$message}}, $opts;
	824	#push @{$problems{$self->get_filename}{$message}}, $opts;
	825	}
	826
	827	# In the next subroutines, we keep track of the text of the current
	828	# innermost thing, like F<fooC<bar>baz>. The things we care about raising
	829	# messages about in this program all come from a single sequence of
	830	# characters uninterrupted by other pod commands. Therefore we don't have
	831	# to worry about recursion, and we can just set the string we care about
	832	# to empty on entrance to each command.
	833
	834	sub handle_text {
	835	# This is called by the parent class to deal with any straight text.
	836	# We mostly just append this to the running current value which will
	837	# be dealt with upon the end of the current construct, like a
	838	# paragraph. But certain things don't contribute to checking the pod
	839	# and are ignored. We also have set flags to indicate this text is
	840	# going towards constructing certain constructs, and handle those
	841	# specially.
	842
	843	my $self = shift;
	844	my $addr = Scalar::Util::refaddr $self;
	845
	846	my $return = $self->SUPER::handle_text(@_);
	847
	848	if ($in_X{$addr} \|\| $in_for{$addr}) { # ignore
	849	return $return;
	850	}
	851
	852	my $text = join "\n", @_;
	853	$running_simple_text{$addr} .= $text;
	854
	855	# Keep separate tabs on C<>, F<>, and L<> directives, and one
	856	# especially for C<> ones.
	857	if ($in_CFL{$addr}) {
	858	$CFL_text{$addr} .= $text;
	859	$C_text{$addr} .= $text if defined $C_text{$addr};
	860	}
	861	else {
	862	# This variable is updated instead in the corresponding C, F, or L
	863	# handler.
	864	$running_CFL_text{$addr} .= $text;
	865	}
	866
	867	# do this line-by-line so we can get the right line number
	868	my @lines = split /^/, $running_simple_text{$addr};
	869	for my $i (0..$#lines) {
	870	if ($lines[$i] =~ m/\N{REPLACEMENT CHARACTER}/) {
	871	$self->poderror({ -line => $start_line{$addr} + $i,
	872	-msg => $replacement_character,
	873	parameter => "possibly invalid ". $self->encoding . " input at character " . pos $lines[$i],
	874	});
	875	}
	876	}
	877	return $return;
	878	}
	879
	880	# The start_FOO routines check that somehow a C<> construct hasn't escaped
	881	# without being checked, and initialize things, and call the parent
	882	# class's equivalent routine.
	883
	884	# The end_FOO routines close things off, and check the text that has been
	885	# accumulated for FOO, then call the parent's corresponding routine.
	886
	887	sub start_Para {
	888	my $self = shift;
	889	check_see_but_not_link($self);
	890
	891	my $addr = Scalar::Util::refaddr $self;
	892	$start_line{$addr} = $_[0]->{start_line};
	893	$running_CFL_text{$addr} = "";
	894	$running_simple_text{$addr} = "";
	895	return $self->SUPER::start_Para(@_);
	896	}
	897
	898	sub start_item_text {
	899	my $self = shift;
	900	check_see_but_not_link($self);
	901
	902	my $addr = Scalar::Util::refaddr $self;
	903	$start_line{$addr} = $_[0]->{start_line};
	904	$running_CFL_text{$addr} = "";
	905	$running_simple_text{$addr} = "";
	906
	907	# This is the only =item that is linkable
	908	$linkable_item{$addr} = 1;
	909
	910	return $self->SUPER::start_item_text(@_);
	911	}
	912
	913	sub start_item_number {
	914	my $self = shift;
	915	check_see_but_not_link($self);
	916
	917	my $addr = Scalar::Util::refaddr $self;
	918	$start_line{$addr} = $_[0]->{start_line};
	919	$running_CFL_text{$addr} = "";
	920	$running_simple_text{$addr} = "";
	921
	922	return $self->SUPER::start_item_number(@_);
	923	}
	924
	925	sub start_item_bullet {
	926	my $self = shift;
	927	check_see_but_not_link($self);
	928
	929	my $addr = Scalar::Util::refaddr $self;
	930	$start_line{$addr} = $_[0]->{start_line};
	931	$running_CFL_text{$addr} = "";
	932	$running_simple_text{$addr} = "";
	933
	934	return $self->SUPER::start_item_bullet(@_);
	935	}
	936
	937	sub end_item { # No difference in =item types endings
	938	my $self = shift;
	939	check_see_but_not_link($self);
	940	return $self->SUPER::end_item(@_);
	941	}
	942
	943	sub start_over {
	944	my $self = shift;
	945	check_see_but_not_link($self);
	946
	947	my $addr = Scalar::Util::refaddr $self;
	948	$start_line{$addr} = $_[0]->{start_line};
	949	$running_CFL_text{$addr} = "";
	950	$running_simple_text{$addr} = "";
	951
	952	# Save this indent on a stack, and keep track of total indent
	953	my $indent = $_[0]{'indent'};
	954	push @{$indents{$addr}}, $indent;
	955	$current_indent{$addr} += $indent;
	956
	957	return $self->SUPER::start_over(@_);
	958	}
	959
	960	sub end_over_bullet { shift->end_over(@_) }
	961	sub end_over_number { shift->end_over(@_) }
	962	sub end_over_text { shift->end_over(@_) }
	963	sub end_over_block { shift->end_over(@_) }
	964	sub end_over_empty { shift->end_over(@_) }
	965	sub end_over {
	966	my $self = shift;
	967	check_see_but_not_link($self);
	968
	969	my $addr = Scalar::Util::refaddr $self;
	970
	971	# Pop current indent
	972	if (@{$indents{$addr}}) {
	973	$current_indent{$addr} -= pop @{$indents{$addr}};
	974	}
	975	else {
	976	# =back without corresponding =over, but should have
	977	# warned already
	978	$current_indent{$addr} = 0;
	979	}
	980	}
	981
	982	sub check_see_but_not_link {
	983
	984	# Looks through accumulated text for current element that includes the
	985	# C<>, F<>, and L<> directives for ones that look like they are
	986	# C<link> instead of L<link>.
	987
	988	my $self = shift;
	989	my $addr = Scalar::Util::refaddr $self;
	990
	991	return unless defined $running_CFL_text{$addr};
	992
	993	while ($running_CFL_text{$addr} =~ m{
	994	( (?: \w+ \s+ )* ) # The phrase before, if any
	995	\b [Ss]ee \s+
	996	( ( [^L] )
	997	<
	998	( [^<]*? ) # The not < excludes nested C<L<...
	999	>
	1000	)
	1001	( \s+ (?: under \| in ) \s+ L< )?
	1002	}xg)
	1003	{
	1004	my $prefix = $1 // "";
	1005	my $construct = $2; # The whole thing, like C<...>
	1006	my $type = $3;
	1007	my $interior = $4;
	1008	my $trailing = $5; # After the whole thing ending in "L<"
	1009
	1010	# If the full phrase is something like, "you might see C<", or
	1011	# similar, it really isn't a reference to a link. The ones I saw
	1012	# all had the word "you" in them; and the "you" wasn't the
	1013	# beginning of a sentence.
	1014	if ($prefix !~ / \b you \b /x) {
	1015
	1016	# Now, find what the module or man page name within the
	1017	# construct would be if it actually has L<> syntax. If it
	1018	# doesn't have that syntax, will set the module to the entire
	1019	# interior.
	1020	if (! defined $trailing # not referring to something in another
	1021	# section
	1022	&& $interior !~ /$non_pods/
	1023
	1024	# There can't be spaces (I think) in module names or man
	1025	# pages
	1026	&& $interior !~ / \s /x
	1027
	1028	# F<> that end in eg \.pl are almost certainly ok, as are
	1029	# those that look like a path with multiple "/" chars
	1030	&& ($type ne "F"
	1031	\|\| (! -e $interior
	1032	&& $interior !~ /\.\w+$/
	1033	&& $interior !~ /\/.+\//)
	1034	)
	1035	) {
	1036	# TODO: move the checking of $pedantic higher up
	1037	$self->poderror({ -line => $start_line{$addr},
	1038	-msg => $C_not_linked,
	1039	parameter => $construct
	1040	});
	1041	}
	1042	}
	1043	}
	1044
	1045	undef $running_CFL_text{$addr};
	1046	}
	1047
	1048	sub end_Para {
	1049	my $self = shift;
	1050	check_see_but_not_link($self);
	1051
	1052	my $addr = Scalar::Util::refaddr $self;
	1053	if ($in_NAME{$addr}) {
	1054	if ($running_simple_text{$addr} =~ /^\s(\S+?)\s$/) {
	1055	$self->poderror({ -line => $start_line{$addr},
	1056	-msg => $missing_name_description,
	1057	parameter => $1});
	1058	}
	1059	$in_NAME{$addr} = 0;
	1060	}
	1061	$self->SUPER::end_Para(@_);
	1062	}
	1063
	1064	sub start_head1 {
	1065	my $self = shift;
	1066	check_see_but_not_link($self);
	1067
	1068	my $addr = Scalar::Util::refaddr $self;
	1069	$start_line{$addr} = $_[0]->{start_line};
	1070	$running_CFL_text{$addr} = "";
	1071	$running_simple_text{$addr} = "";
	1072
	1073	return $self->SUPER::start_head1(@_);
	1074	}
	1075
	1076	sub end_head1 { # This is called at the end of the =head line.
	1077	my $self = shift;
	1078	check_see_but_not_link($self);
	1079
	1080	my $addr = Scalar::Util::refaddr $self;
	1081
	1082	$in_NAME{$addr} = 1 if $running_simple_text{$addr} eq 'NAME';
	1083	return $self->SUPER::end_head(@_);
	1084	}
	1085
	1086	sub start_Verbatim {
	1087	my $self = shift;
	1088	check_see_but_not_link($self);
	1089
	1090	my $addr = Scalar::Util::refaddr $self;
	1091	$running_simple_text{$addr} = "";
	1092	$start_line{$addr} = $_[0]->{start_line};
	1093	return $self->SUPER::start_Verbatim(@_);
	1094	}
	1095
	1096	sub end_Verbatim {
	1097	my $self = shift;
	1098	my $addr = Scalar::Util::refaddr $self;
	1099
	1100	# Pick up the name if it looks like one, since the parent class
	1101	# doesn't handle verbatim NAMEs
	1102	if ($in_NAME{$addr}
	1103	&& $running_simple_text{$addr} =~ /^\s(\S+?)\s[,-]/)
	1104	{
	1105	$self->name($1);
	1106	}
	1107
	1108	my $indent = $self->get_current_indent;
	1109
	1110	# Look at each line to verify it is short enough
	1111	my @lines = split /^/, $running_simple_text{$addr};
	1112	for my $i (0 .. @lines - 1) {
	1113	$lines[$i] =~ s/\s+$//;
	1114	my $exceeds = length(Text::Tabs::expand($lines[$i]))
	1115	+ $indent - $MAX_LINE_LENGTH;
	1116	next unless $exceeds > 0;
	1117
	1118	$self->poderror({ -line => $start_line{$addr} + $i,
	1119	-msg => $line_length,
	1120	parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
	1121	});
	1122	}
	1123
	1124	undef $running_simple_text{$addr};
	1125
	1126	# Parent class didn't bother to define this
	1127	#return $self->SUPER::SUPER::end_Verbatim(@_);
	1128	}
	1129
	1130	sub start_C {
	1131	my $self = shift;
	1132	my $addr = Scalar::Util::refaddr $self;
	1133
	1134	$C_text{$addr} = "";
	1135
	1136	# If not in a stacked set of C<>, F<> and L<>, initialize the text for
	1137	# them.
	1138	$CFL_text{$addr} = "" if ! $in_CFL{$addr};
	1139	$in_CFL{$addr}++;
	1140
	1141	return $self->SUPER::start_C(@_);
	1142	}
	1143
	1144	sub start_F {
	1145	my $self = shift;
	1146	my $addr = Scalar::Util::refaddr $self;
	1147
	1148	$CFL_text{$addr} = "" if ! $in_CFL{$addr};
	1149	$in_CFL{$addr}++;
	1150	return $self->SUPER::start_F(@_);
	1151	}
	1152
	1153	sub start_L {
	1154	my $self = shift;
	1155	my $addr = Scalar::Util::refaddr $self;
	1156
	1157	$CFL_text{$addr} = "" if ! $in_CFL{$addr};
	1158	$in_CFL{$addr}++;
	1159	return $self->SUPER::start_L(@_);
	1160	}
	1161
	1162	sub end_C {
	1163	my $self = shift;
	1164	my $addr = Scalar::Util::refaddr $self;
	1165
	1166	# Warn if looks like a file or link enclosed instead by this C<>
	1167	if ($C_text{$addr} =~ qr/^ $C_path_re $/x) {
	1168	# Here it does look like it could be be a file path or a link.
	1169	# But some varieties of regex patterns could also fit with what we
	1170	# have so far. Weed those out as best we can. '/foo/' is almost
	1171	# certainly meant to be a pattern, as is '/foo/g'.
	1172	my $is_pattern;
	1173	if ($C_text{$addr} !~ qr\| ^ / [^/]* / ( [msixpodualngcr]* ) $ \|x) {
	1174	$is_pattern = 0;
	1175	}
	1176	else {
	1177
	1178	# Here, it looks like a pattern potentially followed by some
	1179	# modifiers. To make doubly sure, don't count as patterns
	1180	# those constructs which have more occurrences (generally 1)
	1181	# of a modifier than is legal.
	1182	my %counts;
	1183	map { $counts{$_}++ } split "", $1;
	1184	foreach my $modifier (keys %counts) {
	1185	if ($counts{$modifier} > (($modifier eq 'a')
	1186	? 2
	1187	: 1))
	1188	{
	1189	$is_pattern = 0;
	1190	last;
	1191	}
	1192	}
	1193	$is_pattern = 1 unless defined $is_pattern;
	1194	}
	1195
	1196	unless ($is_pattern) {
	1197	$self->poderror({ -line => $start_line{$addr},
	1198	-msg => $C_with_slash,
	1199	parameter => "C<$C_text{$addr}>"
	1200	});
	1201	}
	1202	}
	1203	undef $C_text{$addr};
	1204
	1205	# Add the current text to the running total. This was not done in
	1206	# handle_text(), because it just sees the plain text of the innermost
	1207	# stacked directive. We want to keep all the directive names
	1208	# enclosing the text. Otherwise the fact that C<L<foobar>> is to a
	1209	# link would be lost, as the L<> would be gone.
	1210	$CFL_text{$addr} = "C<$CFL_text{$addr}>";
	1211
	1212	# Add this text to the the whole running total only if popping this
	1213	# directive off the stack leaves it empty. As long as something is on
	1214	# the stack, it gets added to $CFL_text (just above). It is only
	1215	# entirely constructed when the stack is empty.
	1216	$in_CFL{$addr}--;
	1217	$running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
	1218
	1219	return $self->SUPER::end_C(@_);
	1220	}
	1221
	1222	sub end_F {
	1223	my $self = shift;
	1224	my $addr = Scalar::Util::refaddr $self;
	1225
	1226	$CFL_text{$addr} = "F<$CFL_text{$addr}>";
	1227	$in_CFL{$addr}--;
	1228	$running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
	1229	return $self->SUPER::end_F(@_);
	1230	}
	1231
	1232	sub end_L {
	1233	my $self = shift;
	1234	my $addr = Scalar::Util::refaddr $self;
	1235
	1236	$CFL_text{$addr} = "L<$CFL_text{$addr}>";
	1237	$in_CFL{$addr}--;
	1238	$running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
	1239	return $self->SUPER::end_L(@_);
	1240	}
	1241
	1242	sub start_X {
	1243	my $self = shift;
	1244	my $addr = Scalar::Util::refaddr $self;
	1245
	1246	$in_X{$addr} = 1;
	1247	return $self->SUPER::start_X(@_);
	1248	}
	1249
	1250	sub end_X {
	1251	my $self = shift;
	1252	my $addr = Scalar::Util::refaddr $self;
	1253
	1254	$in_X{$addr} = 0;
	1255	return $self->SUPER::end_X(@_);
	1256	}
	1257
	1258	sub start_for {
	1259	my $self = shift;
	1260	my $addr = Scalar::Util::refaddr $self;
	1261
	1262	$in_for{$addr} = 1;
	1263	return $self->SUPER::start_for(@_);
	1264	}
	1265
	1266	sub end_for {
	1267	my $self = shift;
	1268	my $addr = Scalar::Util::refaddr $self;
	1269
	1270	$in_for{$addr} = 0;
	1271	return $self->SUPER::end_for(@_);
	1272	}
	1273
	1274	sub hyperlink {
	1275	my ($self, $link) = @_;
	1276
	1277	if ($link && $link->type eq 'pod') {
	1278	my $page = $link->page;
	1279	my $node = $link->node;
	1280
	1281	# If the hyperlink is to an interior node of another page, save it
	1282	# so that we can see if we need to parse normally skipped files.
	1283	$has_referred_to_node{$page} = 1 if $node;
	1284
	1285	# Ignore certain placeholder links in perldelta. Check if the
	1286	# link is page-level, and also check if to a node within the page
	1287	if ( $self->name && $self->name eq "perldelta"
	1288	&& (( grep { $page eq $_ } @perldelta_ignore_links)
	1289	\|\| ( $node
	1290	&& (grep { "$page/$node" eq $_ } @perldelta_ignore_links)
	1291	))) {
	1292	return;
	1293	}
	1294	}
	1295
	1296	return $self->SUPER::hyperlink($link);
	1297	}
	1298
	1299	sub node {
	1300	my $self = shift;
	1301	my $text = $_[0];
	1302	if($text) {
	1303	$text =~ s/\s+$//s; # strip trailing whitespace
	1304	$text =~ s/\s+/ /gs; # collapse whitespace
	1305	my $addr = Scalar::Util::refaddr $self;
	1306	push(@{$linkable_nodes{$addr}}, $text) if
	1307	! $current_indent{$addr}
	1308	\|\| $linkable_item{$addr};
	1309	}
	1310	return $self->SUPER::node($_[0]);
	1311	}
	1312
	1313	sub get_current_indent {
	1314	return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
	1315	}
	1316
	1317	sub get_filename {
	1318	return $filename{Scalar::Util::refaddr $_[0]};
	1319	}
	1320
	1321	sub linkable_nodes {
	1322	my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
	1323	return undef unless $linkables;
	1324	return @$linkables;
	1325	}
	1326
	1327	sub get_skip {
	1328	return $skip{Scalar::Util::refaddr $_[0]} // 0;
	1329	}
	1330
	1331	sub set_skip {
	1332	my $self = shift;
	1333	$skip{Scalar::Util::refaddr $self} = shift;
	1334
	1335	# If skipping, no need to keep the problems for it
	1336	delete $problems{$self->get_filename};
	1337	return;
	1338	}
	1339
	1340	sub parse_from_file {
	1341	# This overrides the super class method so that if an open fails on a
	1342	# transitory file, it doesn't croak. It returns 1 if it did find the
	1343	# file, 0 if it didn't
	1344
	1345	my $self = shift;
	1346	my $filename = shift;
	1347	# ignores 2nd param, which is output file. Always uses undef
	1348
	1349	if (open my $in_fh, '<:bytes', $filename) {
	1350	$self->SUPER::parse_from_file($in_fh, undef);
	1351	close $in_fh;
	1352	return 1;
	1353	}
	1354
	1355	# If couldn't open file, perhaps it was transitory, and hence not an error
	1356	return 0 unless -e $filename;
	1357
	1358	die "Can't open '$filename': $!\n";
	1359	}
	1360	}
	1361
	1362	my %filename_to_checker; # Map a filename to its pod checker object
	1363	my %id_to_checker; # Map a checksum to its pod checker object
	1364	my %nodes; # key is filename, values are nodes in that file.
	1365	my %nodes_first_word; # same, but value is first word of each node
	1366	my %valid_modules; # List of modules known to exist outside us.
	1367	my %digests; # checksums of files, whose names are the keys
	1368	my %filename_to_pod; # Map a filename to its pod NAME
	1369	my %files_with_unknown_issues;
	1370	my %files_with_fixes;
	1371
	1372	my $data_fh;
	1373	open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
	1374
	1375	my %counts; # For --counts param, count of each issue type
	1376	my %suppressed_files; # Files with at least one issue type to suppress
	1377	my $HEADER = <<END;
	1378	# This file is the data file for $0.
	1379	# There are three types of lines.
	1380	# Comment lines are white-space only or begin with a '#', like this one. Any
	1381	# changes you make to the comment lines will be lost when the file is
	1382	# regen'd.
	1383	# Lines without tab characters are simply NAMES of pods that the program knows
	1384	# will have links to them and the program does not check if those links are
	1385	# valid.
	1386	# All other lines should have three fields, each separated by a tab. The
	1387	# first field is the name of a pod; the second field is an error message
	1388	# generated by this program; and the third field is a count of how many
	1389	# known instances of that message there are in the pod. -1 means that the
	1390	# program can expect any number of this type of message.
	1391	END
	1392
	1393	my @existing_issues;
	1394
	1395
	1396	while (<$data_fh>) { # Read the database
	1397	chomp;
	1398	next if /^\s*(?:#\|$)/; # Skip comment and empty lines
	1399	if (/\t/) {
	1400	next if $show_all;
	1401	if ($add_link) { # The issues are saved and later output unchanged
	1402	push @existing_issues, $_;
	1403	next;
	1404	}
	1405
	1406	# Keep track of counts of each issue type for each file
	1407	my ($filename, $message, $count) = split /\t/;
	1408	$known_problems{$filename}{$message} = $count;
	1409
	1410	if ($show_counts) {
	1411	if ($count < 0) { # -1 means to suppress this issue type
	1412	$suppressed_files{$filename} = $filename;
	1413	}
	1414	else {
	1415	$counts{$message} += $count;
	1416	}
	1417	}
	1418	}
	1419	else { # Lines without a tab are modules known to be valid
	1420	$valid_modules{$_} = 1
	1421	}
	1422	}
	1423	close $data_fh;
	1424
	1425	if ($add_link) {
	1426	$copy_fh = open_new($known_issues);
	1427
	1428	# Check for basic sanity, and add each command line argument
	1429	foreach my $module (@files) {
	1430	die "\"$module\" does not look like a module or man page"
	1431	# Must look like (A or A::B or A::B::C ..., or foo(3C)
	1432	if $module !~ /^ (?: \w+ (?: :: \w+ )* \| \w+ $ \d \w* $ ) $/x;
	1433	$valid_modules{$module} = 1
	1434	}
	1435	my_safer_print($copy_fh, $HEADER);
	1436	foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
	1437	my_safer_print($copy_fh, $_, "\n");
	1438	}
	1439
	1440	# The rest of the db file is output unchanged.
	1441	my_safer_print($copy_fh, join "\n", @existing_issues, "");
	1442
	1443	close_and_rename($copy_fh);
	1444	exit;
	1445	}
	1446
	1447	if ($show_counts) {
	1448	my $total = 0;
	1449	foreach my $message (sort keys %counts) {
	1450	$total += $counts{$message};
	1451	note(Text::Tabs::expand("$counts{$message}\t$message"));
	1452	}
	1453	note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
	1454	if (%suppressed_files) {
	1455	note("\nFiles that have all messages of at least one type suppressed:");
	1456	note(join ",", sort keys %suppressed_files);
	1457	}
	1458	exit 0;
	1459	}
	1460
	1461	# re to match files that are to be parsed only if there is an internal link
	1462	# to them. It does not include cpan, as whether those are parsed depends
	1463	# on a switch. Currently, only perltoc and the stable perldelta.pod's
	1464	# are included. The latter all have characters between 'perl' and
	1465	# 'delta'. (Actually the currently developed one matches as well, but
	1466	# is a duplicate of perldelta.pod, so can be skipped, so fine for it to
	1467	# match this.
	1468	my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $
	1469	/x;
	1470	unless ($do_deltas) {
	1471	$only_for_interior_links_re = qr/$only_for_interior_links_re \|
	1472	\b perl \d+ delta \. pod \b
	1473	/x;
	1474	}
	1475
	1476	{ # Closure
	1477	my $first_time = 1;
	1478
	1479	sub output_thanks ($$$$) { # Called when an issue has been fixed
	1480	my $filename = shift;
	1481	my $original_count = shift;
	1482	my $current_count = shift;
	1483	my $message = shift;
	1484
	1485	$files_with_fixes{$filename} = 1;
	1486	my $return;
	1487	my $fixed_count = $original_count - $current_count;
	1488	my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
	1489	my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
	1490	my $diff;
	1491	if ($message) {
	1492	$diff = <<EOF;
	1493	There were $original_count occurrences (now $current_count) in this pod of type
	1494	"$message",
	1495	EOF
	1496	} else {
	1497	$diff = <<EOF;
	1498	There are no longer any problems found in this pod!
	1499	EOF
	1500	}
	1501
	1502	if ($first_time) {
	1503	$first_time = 0;
	1504	$return = <<EOF;
	1505	Thanks for fixing $a_problem!
	1506	$diff
	1507	Now you must teach $0 that this was fixed.
	1508	EOF
	1509	}
	1510	else {
	1511	$return = <<EOF
	1512	Thanks for fixing $another_problem.
	1513	$diff
	1514	EOF
	1515	}
	1516
	1517	return $return;
	1518	}
	1519	}
	1520
	1521	sub my_safer_print { # print, with error checking for outputting to db
	1522	my ($fh, @lines) = @_;
	1523
	1524	if (! print $fh @lines) {
	1525	my $save_error = $!;
	1526	close($fh);
	1527	die "Write failure: $save_error";
	1528	}
	1529	}
	1530
	1531	sub extract_pod { # Extracts just the pod from a file; returns undef if file
	1532	# doesn't exist
	1533	my $filename = shift;
	1534
	1535	if (open my $in_fh, '<:bytes', $filename) {
	1536	use Pod::Simple::JustPod;
	1537	my $parser = Pod::Simple::JustPod->new();
	1538	$parser->no_errata_section(1);
	1539	$parser->source_filename($filename);
	1540	my $output;
	1541	$parser->output_string( \$output );
	1542	$parser->parse_lines( <$in_fh>, undef );
	1543	close $in_fh;
	1544
	1545	return $output;
	1546	}
	1547
	1548	# The file should already have been opened once to get here, so if that
	1549	# fails, something is wrong. It's possible that a transitory file
	1550	# containing a pod would get here, so if the file no longer exists just
	1551	# return undef.
	1552	return unless -e $filename;
	1553	die "Can't open '$filename': $!\n";
	1554	}
	1555
	1556	my $digest = Digest->new($digest_type);
	1557
	1558	# This is used as a callback from File::Find::find(), which always constructs
	1559	# pathnames using Unix separators
	1560	sub is_pod_file {
	1561	# If $_ is a pod file, add it to the lists and do other prep work.
	1562
	1563	if (-d) {
	1564	# Don't look at files in directories that are for tests, nor those
	1565	# beginning with a dot
	1566	if (m!/t\z! \|\| m!/\.!) {
	1567	$File::Find::prune = 1;
	1568	}
	1569	return;
	1570	}
	1571
	1572	return unless -r && -s; # Can't check it if can't read it; no need to
	1573	# check if 0 length
	1574	return unless -f \|\| -l; # Weird file types won't be pods
	1575
	1576	my ($leaf) = m!([^/]+)\z!;
	1577	if (m!/\.! # No hidden Unix files
	1578	\|\| $leaf =~ $non_pods) {
	1579	note("Not considering $_") if DEBUG;
	1580	return;
	1581	}
	1582
	1583	my $filename = $File::Find::name;
	1584
	1585	# $filename is relative, like './path'. Strip that initial part away.
	1586	$filename =~ s!^\./!! or die 'Unexpected pathname "$filename"';
	1587
	1588	return if $excluded_files{canonicalize($filename)};
	1589
	1590	my $contents = do {
	1591	local $/;
	1592	my $candidate;
	1593	if (! open $candidate, '<:bytes', $_) {
	1594
	1595	# If a transitory file was found earlier, the open could fail
	1596	# legitimately and we just skip the file; also skip it if it is a
	1597	# broken symbolic link, as it is probably just a build problem;
	1598	# certainly not a file that we would want to check the pod of.
	1599	# Otherwise fail it here and no reason to process it further.
	1600	# (But the test count will be off too)
	1601	ok(0, "Can't open '$filename': $!")
	1602	if -r $filename && ! -l $filename;
	1603	return;
	1604	}
	1605	<$candidate>;
	1606	};
	1607
	1608	# If the file is a .pm or .pod, having any initial '=' on a line is
	1609	# grounds for testing it. Otherwise, require a head1 NAME line to
	1610	# consider it as a potential pod
	1611	if ($filename =~ /\.(?:pm\|pod)/) {
	1612	return unless $contents =~ /^=/m;
	1613	} else {
	1614	return unless $contents =~ /^=head1 +NAME/m;
	1615	}
	1616
	1617	# Here, we know that the file is a pod. Add it to the list of files
	1618	# to check and create a checker object for it.
	1619
	1620	push @files, $filename;
	1621	my $checker = My::Pod::Checker->new($filename);
	1622	$filename_to_checker{$filename} = $checker;
	1623
	1624	# In order to detect duplicate pods and only analyze them once, we
	1625	# compute checksums for the file, so don't have to do an exact
	1626	# compare. Note that if the pod is just part of the file, the
	1627	# checksums can differ for the same pod. That special case is handled
	1628	# later, since if the checksums of the whole file are the same, that
	1629	# case won't even come up. We don't need the checksums for files that
	1630	# we parse only if there is a link to its interior, but we do need its
	1631	# NAME, which is also retrieved in the code below.
	1632
	1633	if ($filename =~ / (?: ^(cpan\|lib\|ext\|dist)\/ )
	1634	\| $only_for_interior_links_re
	1635	/x)
	1636	{
	1637	$digest->add($contents);
	1638	$digests{$filename} = $digest->digest;
	1639
	1640	# lib files aren't analyzed if they are duplicates of files copied
	1641	# there from some other directory. But to determine this, we need
	1642	# to know their NAMEs. We might as well find the NAME now while
	1643	# the file is open. Similarly, cpan files aren't analyzed unless
	1644	# we're analyzing all of them, or this particular file is linked
	1645	# to by a file we are analyzing, and thus we will want to verify
	1646	# that the target exists in it. We need to know at least the NAME
	1647	# to see if it's worth analyzing, or so we can determine if a lib
	1648	# file is a copy of a cpan one.
	1649	if ($filename =~ m{ (?: ^ (?: cpan \| lib ) / )
	1650	\| $only_for_interior_links_re
	1651	}x) {
	1652	if ($contents =~ /^=head1 +NAME.*/mg) {
	1653	# The NAME is the first non-spaces on the line up to a
	1654	# comma, dash or end of line. Otherwise, it's invalid and
	1655	# this pod doesn't have a legal name that we're smart
	1656	# enough to find currently. But the parser will later
	1657	# find it if it thinks there is a legal name, and set the
	1658	# name
	1659	if ($contents =~ /\G # continue from the line after =head1
	1660	\s* # ignore any empty lines
	1661
	1662	# ignore =for paragraphs followed by empty
	1663	# lines
	1664	(?: ^ =for .? \n (?: [^\s]? \n )* \s* )*
	1665
	1666	^ \s* ( \S+?) \s* (?: [,-] \| $ )/mx) {
	1667	my $name = $1;
	1668	$checker->name($name);
	1669	$id_to_checker{$name} = $checker
	1670	if $filename =~ m{^cpan/};
	1671	}
	1672	}
	1673	elsif ($filename =~ m{^cpan/}) {
	1674	$id_to_checker{$digests{$filename}} = $checker;
	1675	}
	1676	}
	1677	}
	1678
	1679	return;
	1680	} # End of is_pod_file()
	1681
	1682	# Start of real code that isn't processing the command line (except the
	1683	# db is read in above, as is processing of the --add_link option).
	1684	# Here, @files contains list of files on the command line. If have any of
	1685	# these, unconditionally test them, and show all the errors, even the known
	1686	# ones, and, since not testing other pods, don't do cross-pod link tests.
	1687	# (Could add extra code to do cross-pod tests for the ones in the list.)
	1688
	1689	if ($has_input_files) {
	1690	undef %known_problems;
	1691	$do_upstream_cpan = $do_deltas = 1; # In case one of the inputs is one
	1692	# of these types
	1693	}
	1694	else { # No input files -- go find all the possibilities.
	1695	if ($regen) {
	1696	$copy_fh = open_new($known_issues);
	1697	note("Regenerating $known_issues, please be patient...");
	1698	print $copy_fh $HEADER;
	1699	}
	1700
	1701	# Move to the directory above us, but have to adjust @INC to account for
	1702	# that.
	1703	s{^\.\./lib$}{lib} for @INC;
	1704	chdir File::Spec->updir;
	1705
	1706	# And look in this directory and all its subdirectories
	1707	find( {wanted => \&is_pod_file, no_chdir => 1}, '.');
	1708
	1709	# Add ourselves to the test
	1710	push @files, "t/porting/podcheck.t";
	1711	}
	1712
	1713	# Now we know how many tests there will be.
	1714	plan (tests => scalar @files) if ! $regen;
	1715
	1716
	1717	# Sort file names so we get consistent results, and to put cpan last,
	1718	# preceded by the ones that we don't generally parse. This is because both
	1719	# these classes are generally parsed only if there is a link to the interior
	1720	# of them, and we have to parse all others first to guarantee that they don't
	1721	# have such a link. 'lib' files come just before these, as some of these are
	1722	# duplicates of others. We already have figured this out when gathering the
	1723	# data as a special case for all such files, but this, while unnecessary,
	1724	# puts the derived file last in the output. 'readme' files come before those,
	1725	# as those also could be duplicates of others, which are considered the
	1726	# primary ones. These currently aren't figured out when gathering data, so
	1727	# are done here.
	1728	@files = sort { if ($a =~ /^cpan/) {
	1729	return 1 if $b !~ /^cpan/;
	1730	return lc $a cmp lc $b;
	1731	}
	1732	elsif ($b =~ /^cpan/) {
	1733	return -1;
	1734	}
	1735	elsif ($a =~ /$only_for_interior_links_re/) {
	1736	return 1 if $b !~ /$only_for_interior_links_re/;
	1737	return lc $a cmp lc $b;
	1738	}
	1739	elsif ($b =~ /$only_for_interior_links_re/) {
	1740	return -1;
	1741	}
	1742	elsif ($a =~ /^lib/) {
	1743	return 1 if $b !~ /^lib/;
	1744	return lc $a cmp lc $b;
	1745	}
	1746	elsif ($b =~ /^lib/) {
	1747	return -1;
	1748	} elsif ($a =~ /\breadme\b/i) {
	1749	return 1 if $b !~ /\breadme\b/i;
	1750	return lc $a cmp lc $b;
	1751	}
	1752	elsif ($b =~ /\breadme\b/i) {
	1753	return -1;
	1754	}
	1755	else {
	1756	return lc $a cmp lc $b;
	1757	}
	1758	}
	1759	@files;
	1760
	1761	# Now go through all the files and parse them
	1762	FILE:
	1763	foreach my $filename (@files) {
	1764	my $parsed = 0;
	1765	note("parsing $filename") if DEBUG;
	1766
	1767	# We may have already figured out some things in the process of generating
	1768	# the file list. If so, we have a $checker object already. But if not,
	1769	# generate one now.
	1770	my $checker = $filename_to_checker{$filename};
	1771	if (! $checker) {
	1772	$checker = My::Pod::Checker->new($filename);
	1773	$filename_to_checker{$filename} = $checker;
	1774	}
	1775
	1776	# We have set the name in the checker object if there is a possibility
	1777	# that no further parsing is necessary, but otherwise do the parsing now.
	1778	if (! $checker->name) {
	1779	if (! $checker->parse_from_file($filename, undef)) {
	1780	$checker->set_skip("$filename is transitory");
	1781	next FILE;
	1782	}
	1783	$parsed = 1;
	1784	}
	1785
	1786	if ($checker->num_errors() < 0) { # Returns negative if not a pod
	1787	$checker->set_skip("$filename is not a pod");
	1788	}
	1789	else {
	1790
	1791	# Here, is a pod. See if it is one that has already been tested,
	1792	# or should be tested under another directory. Use either its NAME
	1793	# if it has one, or a checksum if not.
	1794	my $name = $checker->name;
	1795	my $id;
	1796
	1797	if ($name) {
	1798	$id = $name;
	1799	}
	1800	else {
	1801	my $digest = Digest->new($digest_type);
	1802	my $contents = extract_pod($filename);
	1803
	1804	# If the return is undef, it means that $filename was a transitory
	1805	# file; skip it.
	1806	next FILE unless defined $contents;
	1807	$digest->add($contents);
	1808	$id = $digest->digest;
	1809	}
	1810
	1811	# If there is a match for this pod with something that we've already
	1812	# processed, don't process it, and output why.
	1813	my $prior_checker;
	1814	if (defined ($prior_checker = $id_to_checker{$id})
	1815	&& $prior_checker != $checker) # Could have defined the checker
	1816	# earlier without pursuing it
	1817	{
	1818
	1819	# If the pods are identical, then it's just a copy, and isn't an
	1820	# error. First use the checksums we have already computed to see
	1821	# if the entire files are identical, which means that the pods are
	1822	# identical too.
	1823	my $prior_filename = $prior_checker->get_filename;
	1824	my $same = (! $name
	1825	\|\| ($digests{$prior_filename}
	1826	&& $digests{$filename}
	1827	&& $digests{$prior_filename} eq $digests{$filename}));
	1828
	1829	# If they differ, it could be that the files differ for some
	1830	# reason, but the pods they contain are identical. Extract the
	1831	# pods and do the comparisons on just those.
	1832	if (! $same && $name) {
	1833	my $contents = extract_pod($filename);
	1834
	1835	# If return is <undef>, it means that $filename no longer
	1836	# exists. This means it was a transitory file, and should not
	1837	# be tested.
	1838	next FILE unless defined $contents;
	1839
	1840	my $prior_contents = extract_pod($prior_filename);
	1841
	1842	# If return is <undef>, it means that $prior_filename no
	1843	# longer exists. This means it was a transitory file, and
	1844	# should not have been tested, but we already did process it.
	1845	# What we should do now is to back-out its records, and
	1846	# process $filename in its stead. But backing out is not so
	1847	# simple, and so I'm (khw) skipping that unless and until
	1848	# experience shows that it is needed. We do go process
	1849	# $filename, and there are potential false positive conflicts
	1850	# with the transitory $prior_contents, and rerunning the test
	1851	# should cause it to succeed.
	1852	goto process_this_pod unless defined $prior_contents;
	1853
	1854	$same = $prior_contents eq $contents;
	1855	}
	1856
	1857	use File::Basename 'basename';
	1858	if ($same) {
	1859	$checker->set_skip("The pod of $filename is a duplicate of "
	1860	. "the pod for $prior_filename");
	1861	} elsif ($prior_filename =~ /\breadme\b/i) {
	1862	$checker->set_skip("$prior_filename is a README apparently for $filename");
	1863	} elsif ($filename =~ /\breadme\b/i) {
	1864	$checker->set_skip("$filename is a README apparently for $prior_filename");
	1865	} elsif (! $do_upstream_cpan
	1866	&& $filename =~ /^cpan/
	1867	&& $prior_filename =~ /^cpan/)
	1868	{
	1869	$checker->set_skip("CPAN is upstream for $filename");
	1870	} elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) {
	1871	$checker->set_skip("$filename copy is in utils/");
	1872	} elsif ($prior_filename =~ /^(?:cpan\|ext\|dist)/
	1873	&& $filename !~ /^(?:cpan\|ext\|dist)/
	1874	&& basename($prior_filename) eq basename($filename))
	1875	{
	1876	$checker->set_skip("$filename: Need to run make?");
	1877	} else { # Here have two pods with identical names that differ
	1878	$prior_checker->poderror(
	1879	{ -msg => $duplicate_name,
	1880	-line => "???",
	1881	parameter => "'$filename' also has NAME '$name'"
	1882	});
	1883	$checker->poderror(
	1884	{ -msg => $duplicate_name,
	1885	-line => "???",
	1886	parameter => "'$prior_filename' also has NAME '$name'"
	1887	});
	1888
	1889	# Changing the names helps later.
	1890	$prior_checker->name("$name version arbitrarily numbered 1");
	1891	$checker->name("$name version arbitrarily numbered 2");
	1892	}
	1893
	1894	# In any event, don't process this pod that has the same name as
	1895	# another.
	1896	next FILE;
	1897	}
	1898
	1899	process_this_pod:
	1900
	1901	# A unique pod.
	1902	$id_to_checker{$id} = $checker;
	1903
	1904	my $parsed_for_links = ", but parsed for its interior links";
	1905	if ((! $do_upstream_cpan && $filename =~ /^cpan/)
	1906	\|\| $filename =~ $only_for_interior_links_re)
	1907	{
	1908	if ($filename =~ /^cpan/) {
	1909	$checker->set_skip("CPAN is upstream for $filename");
	1910	}
	1911	elsif ($filename =~ /perl\d+delta/) {
	1912	if (! $do_deltas) {
	1913	$checker->set_skip("$filename is a stable perldelta");
	1914	}
	1915	}
	1916	elsif ($filename =~ /perltoc/) {
	1917	$checker->set_skip("$filename dependent on component pods");
	1918	}
	1919	else {
	1920	croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
	1921	}
	1922
	1923	if ($name && $has_referred_to_node{$name}) {
	1924	$checker->set_skip($checker->get_skip() . $parsed_for_links);
	1925	}
	1926	}
	1927
	1928	# Need a name in order to process it, because not meaningful
	1929	# otherwise, and also can't test links to this without a name.
	1930	if (!defined $name) {
	1931	$checker->poderror( { -msg => $no_name,
	1932	-line => '???'
	1933	});
	1934	next FILE;
	1935	}
	1936
	1937	# For skipped files, just get its NAME
	1938	my $skip;
	1939	if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
	1940	{
	1941	$checker->node($name) if $name;
	1942	}
	1943	elsif (! $parsed) {
	1944	if (! $checker->parse_from_file($filename, undef)) {
	1945	$checker->set_skip("$filename is transitory");
	1946	next FILE;
	1947	}
	1948	}
	1949
	1950	# Go through everything in the file that could be an anchor that
	1951	# could be a link target. Count how many there are of the same name.
	1952	foreach my $node ($checker->linkable_nodes) {
	1953	next FILE if ! $node; # Can be empty is like '=item *'
	1954	$nodes{$name}{$node}++;
	1955
	1956	# Experiments have shown that cpan search can figure out the
	1957	# target of a link even if the exact wording is incorrect, as long
	1958	# as the first word is. This happens frequently in perlfunc.pod,
	1959	# where the link will be just to the function, but the target
	1960	# entry also includes parameters to the function.
	1961	my $first_word = $node;
	1962	if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
	1963	$nodes_first_word{$name}{$first_word} = $node;
	1964	}
	1965	}
	1966	$filename_to_pod{$filename} = $name;
	1967	}
	1968	}
	1969
	1970	# Here, all files have been parsed, and all links and link targets are stored.
	1971	# Now go through the files again and see which don't have matches.
	1972	if (! $has_input_files) {
	1973	foreach my $filename (@files) {
	1974	next if $filename_to_checker{$filename}->get_skip;
	1975
	1976	my $checker = $filename_to_checker{$filename};
	1977	foreach my $link ($checker->hyperlinks()) {
	1978	my $linked_to_page = $link->page;
	1979	next unless $linked_to_page; # intra-file checks are handled by std
	1980	# Pod::Checker
	1981	# Currently, we assume all external links are valid
	1982	next if $link->type eq 'url';
	1983
	1984	# Initialize the potential message.
	1985	my %problem = ( -msg => $broken_link,
	1986	-line => $link->line,
	1987	parameter => "to \"$linked_to_page\"",
	1988	);
	1989
	1990	# See if we have found the linked-to_file in our parse
	1991	if (exists $nodes{$linked_to_page}) {
	1992	my $node = $link->node;
	1993
	1994	# If link is only to the page-level, already have it
	1995	next if ! $node;
	1996
	1997	# If link is to a node that exists in the file, is ok
	1998	if ($nodes{$linked_to_page}{$node}) {
	1999
	2000	# But if the page has multiple targets with the same name,
	2001	# it's ambiguous which one this should be to.
	2002	if ($nodes{$linked_to_page}{$node} > 1) {
	2003	$problem{-msg} = $multiple_targets;
	2004	$problem{parameter} = "in $linked_to_page that $node could be pointing to";
	2005	$checker->poderror(\%problem);
	2006	}
	2007	} elsif (! $nodes_first_word{$linked_to_page}{$node}) {
	2008
	2009	# Here the link target was not found, either exactly or to
	2010	# the first word. Is an error.
	2011	$problem{parameter} =~ s,"$,/$node",;
	2012	$checker->poderror(\%problem);
	2013	}
	2014
	2015	} # Linked-to-file not in parse; maybe is in exception list
	2016	elsif (! exists $valid_modules{$link->page}) {
	2017
	2018	# Here, is a link to a target that we can't find. Check if
	2019	# there is an internal link on the page with the target name.
	2020	# If so, it could be that they just forgot the initial '/'
	2021	# But perldelta is handled specially: only do this if the
	2022	# broken link isn't one of the known bad ones (that are
	2023	# placemarkers and should be removed for the final)
	2024	my $NAME = $filename_to_pod{$filename};
	2025	if (! defined $NAME) {
	2026	$checker->poderror(\%problem);
	2027	}
	2028	else {
	2029	if ($nodes{$NAME}{$linked_to_page}) {
	2030	$problem{-msg} = $broken_internal_link;
	2031	}
	2032	$checker->poderror(\%problem);
	2033	}
	2034	}
	2035	}
	2036	}
	2037	}
	2038
	2039	# If regenerating the data file, start with the modules for which we don't
	2040	# check targets. If you change the sort order, you need to run --regen before
	2041	# committing so that future commits that do run regen don't show irrelevant
	2042	# changes.
	2043	if ($regen) {
	2044	foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
	2045	my_safer_print($copy_fh, $_, "\n");
	2046	}
	2047	}
	2048
	2049	# Now ready to output the messages.
	2050	foreach my $filename (@files) {
	2051	my $canonical = canonicalize($filename);
	2052	SKIP: {
	2053	my $skip = $filename_to_checker{$filename}->get_skip // "";
	2054
	2055	if ($regen) {
	2056	foreach my $message ( sort keys %{$problems{$filename}}) {
	2057	my $count;
	2058
	2059	# Preserve a negative setting.
	2060	if ($known_problems{$canonical}{$message}
	2061	&& $known_problems{$canonical}{$message} < 0)
	2062	{
	2063	$count = $known_problems{$canonical}{$message};
	2064	}
	2065	else {
	2066	$count = @{$problems{$filename}{$message}};
	2067	}
	2068	my_safer_print($copy_fh, $canonical . "\t$message\t$count\n");
	2069	}
	2070	next;
	2071	}
	2072
	2073	skip($skip, 1) if $skip;
	2074	my @diagnostics;
	2075	my $thankful_diagnostics = 0;
	2076	my $indent = ' ';
	2077
	2078	my $total_known = 0;
	2079	foreach my $message ( sort keys %{$problems{$filename}}) {
	2080	$known_problems{$canonical}{$message} = 0
	2081	if ! $known_problems{$canonical}{$message};
	2082	my $diagnostic = "";
	2083	my $problem_count = scalar @{$problems{$filename}{$message}};
	2084	$total_known += $problem_count;
	2085	next if $known_problems{$canonical}{$message} < 0;
	2086
	2087	# If we have new problems not previously known, we output all of
	2088	# such problems, as we can't know which are really new and which
	2089	# not
	2090	if ($problem_count > $known_problems{$canonical}{$message}) {
	2091
	2092	# Here we are about to output all the messages for this type,
	2093	# subtract back this number we previously added in.
	2094	$total_known -= $problem_count;
	2095
	2096	$diagnostic .= $indent . qq{"$message"};
	2097	if ($problem_count > 2) {
	2098	$diagnostic .= " ($problem_count occurrences,"
	2099	. " expected $known_problems{$canonical}{$message})";
	2100	}
	2101	foreach my $problem (@{$problems{$filename}{$message}}) {
	2102	$diagnostic .= " " if $problem_count == 1;
	2103	$diagnostic .= "\n$indent$indent";
	2104	$diagnostic .= "$problem->{parameter}" if $problem->{parameter};
	2105	$diagnostic .= " near line $problem->{-line} of "
	2106	. $filename;
	2107	$diagnostic .= " $problem->{comment}" if $problem->{comment};
	2108	}
	2109	$diagnostic .= "\n";
	2110	$files_with_unknown_issues{$filename} = 1;
	2111	} elsif ($problem_count < $known_problems{$canonical}{$message}) {
	2112	$diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
	2113	$thankful_diagnostics++;
	2114	}
	2115	push @diagnostics, $diagnostic if $diagnostic;
	2116	}
	2117
	2118	# The above loop has output messages where there are current potential
	2119	# issues. But it misses where there were some that have been entirely
	2120	# fixed. For those, we need to look through the old issues
	2121	foreach my $message ( sort keys %{$known_problems{$canonical}}) {
	2122	next if $problems{$filename}{$message};
	2123	next if ! $known_problems{$canonical}{$message};
	2124	next if $known_problems{$canonical}{$message} < 0; # Preserve negs
	2125
	2126	next if !$pedantic and $message =~
	2127	/^(?:\Q$line_length\E\|\Q$C_not_linked\E\|\Q$C_with_slash\E)/;
	2128
	2129	my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
	2130	push @diagnostics, $diagnostic if $diagnostic;
	2131	$thankful_diagnostics++ if $diagnostic;
	2132	}
	2133
	2134	my $output = "POD of $filename";
	2135	$output .= ", excluding $total_known not shown known potential problems"
	2136	if $total_known;
	2137	if (@diagnostics && @diagnostics == $thankful_diagnostics) {
	2138	# Output fixed issues as passing to-do tests, so they do not
	2139	# cause failures, but t/harness still flags them.
	2140	$output .= " # TODO"
	2141	}
	2142	ok(@diagnostics == $thankful_diagnostics, $output);
	2143	if (@diagnostics) {
	2144	diag(join "", @diagnostics,
	2145	"See end of this test output for your options on silencing this");
	2146	}
	2147
	2148	delete $known_problems{$canonical};
	2149	}
	2150	}
	2151
	2152	if (! $regen
	2153	&& ! ok (keys %known_problems == 0, "The known problems database ($data_dir/known_pod_issues.dat) includes no references to non-existent files"))
	2154	{
	2155	note("The following files were not found: "
	2156	. join ", ", sort keys %known_problems);
	2157	note("They will automatically be removed from the db the next time");
	2158	note(" cd t; ./perl -I../lib porting/podcheck.t --regen");
	2159	note("is run");
	2160	}
	2161
	2162	my $how_to = <<EOF;
	2163	run this test script by hand, using the following formula (on
	2164	Un*x-like machines):
	2165	cd t
	2166	./perl -I../lib porting/podcheck.t --regen
	2167	EOF
	2168
	2169	if (%files_with_unknown_issues) {
	2170	my $were_count_files = scalar keys %files_with_unknown_issues;
	2171	$were_count_files = ($were_count_files == 1)
	2172	? "was $were_count_files file"
	2173	: "were $were_count_files files";
	2174	my $message = <<EOF;
	2175
	2176	HOW TO GET ${\__FILE__} TO PASS
	2177
	2178	There $were_count_files that had new potential problems identified.
	2179	Some of them may be real, and some of them may be false positives because
	2180	this program isn't as smart as it likes to think it is. You can teach this
	2181	program to ignore the issues it has identified, and hence pass, by doing the
	2182	following:
	2183
	2184	1) If a problem is about a link to an unknown module or man page that
	2185	you know exists, re-run the command something like:
	2186	./perl -I../lib porting/podcheck.t --add_link MODULE man_page ...
	2187	(MODULEs should look like Foo::Bar, and man_pages should look like
	2188	bar(3c); don't do this for a module or man page that you aren't sure
	2189	about; instead treat as another type of issue and follow the
	2190	instructions below.)
	2191
	2192	2) For other issues, decide if each should be fixed now or not. Fix the
	2193	ones you decided to, and rerun this test to verify that the fixes
	2194	worked.
	2195
	2196	3) If there remain false positive or problems that you don't plan to fix right
	2197	now,
	2198	$how_to
	2199	That should cause all current potential problems to be accepted by
	2200	the program, so that the next time it runs, they won't be flagged.
	2201	EOF
	2202	if (%files_with_fixes) {
	2203	$message .= " This step will also take care of the files that have fixes in them\n";
	2204	}
	2205
	2206	$message .= <<EOF;
	2207	For a few files, such as perltoc, certain issues will always be
	2208	expected, and more of the same will be added over time. For those,
	2209	before you do the regen, you can edit
	2210	$known_issues
	2211	and find the entry for the module's file and specific error message,
	2212	and change the count of known potential problems to -1.
	2213	EOF
	2214
	2215	diag($message);
	2216	} elsif (%files_with_fixes) {
	2217	diag(<<EOF
	2218	To teach this test script that the potential problems have been fixed,
	2219	$how_to
	2220	EOF
	2221	);
	2222	}
	2223
	2224	if ($regen) {
	2225	chdir $original_dir \|\| die "Can't change directories to $original_dir";
	2226	close_and_rename($copy_fh);
	2227	}
	2228
	2229	1;