This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
podcheck.t: Add --add_link option
[perl5.git] / t / porting / podcheck.t
CommitLineData
a67b1afa 1#!/usr/bin/perl -w
f7b649f0 2
a67b1afa 3use strict;
16384ac1
KW
4use warnings;
5use feature 'unicode_strings';
6
7use Carp;
8use Digest;
9use File::Find;
10use File::Spec;
11use Scalar::Util;
12use Text::Tabs;
13
14BEGIN {
15 require '../regen/regen_lib.pl';
16}
17
18sub DEBUG { 0 };
19
20=pod
21
22=head1 NAME
23
24podcheck.t - Look for possible problems in the Perl pods
25
26=head1 SYNOPSIS
27
28 cd t
29 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
30 [ FILE ...]
477100f8
KW
31 ./perl -I../lib porting/podcheck.t --add_link MODULE ...
32
16384ac1
KW
33 ./perl -I../lib porting/podcheck.t --regen
34
35=head1 DESCRIPTION
36
37podcheck.t is an extension of Pod::Checker. It looks for pod errors and
38potential errors in the files given as arguments, or if none specified, in all
39pods in the distribution workspace, except those in the cpan directory (unless
40C<--cpan> is specified). It does additional checking beyond that done by
41Pod::Checker, and keeps a database of known potential problems, and will
42fail a pod only if the number of such problems differs from that given in the
43database. It also suppresses the C<(section) deprecated> message from
44Pod::Checker, since specifying the man page section number is quite proper to do.
45
46The additional checks it makes are:
47
48=over
49
50=item Cross-pod link checking
51
52Pod::Checker verifies that links to an internal target in a pod are not
53broken. podcheck.t extends that (when called without FILE arguments) to
54external links. It does this by gathering up all the possible targets in the
477100f8
KW
55workspace, and cross-checking them. It also checks that a non-broken link
56points to just one target. (The destination pod could have two targets with
57the same name.)
58
59The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod)
60is to actually create a link to C<search.cpan.org> with an embedded query for
61the desired pod or man page. That means that links outside the distribution
62are valid. podcheck.t doesn't verify the validity of such links, but instead
63keeps a data base of those known to be valid. This means that if a link to a
64target not on the list is created, the target needs to be added to the data
65base. This is accomplished via the L<--add_link|/--add_link MODULE ...>
66option to podcheck.t, described below.
16384ac1
KW
67
68=item An internal link that isn't so specified
69
70If a link is broken, but there is an existing internal target of the same
71name, it is likely that the internal target was meant, and the C<"/"> is
72missing from the C<LE<lt>E<gt>> pod command.
73
74=item Verbatim paragraphs that wrap in an 80 column window
75
76It's annoying to have lines wrap when displaying pod documentation in a
77terminal window. This checks that all such lines fit, and for those that
78don't, it tells you how much needs to be cut in order to fit. However,
79if you're fixing these, keep in mind that some terminal/pager combinations
80require really a maximum of 79 or 78 columns to display properly.
81
82Often, the easiest thing to do to gain space for these is to lower the indent
83to just one space.
84
85=item Missing or duplicate NAME or missing NAME short description
86
87A pod can't be linked to unless it has a unique name.
88And a NAME should have a dash and short description after it.
89
90=item =encoding statement issues
91
92This indicates if an C<=encoding> statement should be present, or moved to the
93front of the pod.
94
95=item Items that perhaps should be links
96
97There are mentions of apparent files in the pods that perhaps should be links
98instead, using C<LE<lt>...E<gt>>
99
100=item Items that perhaps should be C<FE<lt>...E<gt>>
101
102What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
103C<FE<lt>...E<gt>> mark-up instead.
104
105=back
106
107A number of issues raised by podcheck.t and by the base Pod::Checker are not
108really problems, but merely potential problems. After inspecting them and
109deciding that they aren't real problems, it is possible to shut up this program
110about them, unlike base Pod::Checker. To do this, call podcheck.t with the
111C<--regen> option to regenerate the database. This tells it that all existing
112issues are to not be mentioned again.
113
114This isn't fool-proof. The database merely keeps track of the number of these
115potential problems of each type for each pod. If a new problem of a given
116type is introduced into the pod, podcheck.t will spit out all of them. You
117then have to figure out which is the new one, and should it be changed or not.
118But doing it this way insulates the database from having to keep track of line
119numbers of problems, which may change, or the exact wording of each problem
120which might also change without affecting whether it is a problem or not.
121
122Also, if the count of potential problems of a given type for a pod decreases,
123the database must be regenerated so that it knows the new number. The program
124gives instructions when this happens.
125
126There is currently no check that modules listed as valid in the data base
127actually are. Thus any errors introduced there will remain there.
128
129=head1 OPTIONS
130
131=over
132
477100f8
KW
133=item --add_link MODULE ...
134
135Use this option to teach podcheck.t that the C<MODULE>s or man pages actually
136exist, and to silence any messages that links to them are broken.
137
138podcheck.t checks that links within the Perl core distribution are valid, but
139it doesn't check links to man pages or external modules. When it finds
140a broken link, it checks its data base of external modules and man pages,
141and only if not found there does it raise a message. This option just adds
142the list of modules and man page references that follow it on the command line
143to that data base.
144
145For example,
146
147 cd t
148 ./perl -I../lib --add_link Unicode::Casing
149
150causes the external module "Unicode::Casing" to be added to the data base, so
151C<LE<lt>Unicode::Casing<gt>> will be considered valid.
152
16384ac1
KW
153=item --regen
154
155Regenerate the data base used by podcheck.t to include all the existing
156potential problems. Future runs of the program will not then flag any of
157these.
158
159=item --cpan
160
161Normally, all pods in the cpan directory are skipped, except to make sure that
162any blead-upstream links to such pods are valid.
163This option will cause cpan upstream pods to be checked.
164
165=item --show_all
166
167Normally, if the number of potential problems of a given type found for a
168pod matches the expected value in the database, they will not be displayed.
169This option forces the database to be ignored during the run, so all potential
170problems are displayed and will fail their respective pod test. Specifying
171any particular FILES to operate on automatically selects this option.
172
173=item --counts
174
175Instead of testing, this just dumps the counts of the occurrences of the
176various types of potential problems in the data base.
177
178=back
179
180=head1 FILES
181
182The database is stored in F<t/porting/known_pod_issues.dat>
183
184=head1 SEE ALSO
185
186L<Pod::Checker>
187
188=cut
189
190#####################################################
191# HOW IT WORKS (in general)
192#
193# If not called with specific files to check, the directory structure is
194# examined for files that have pods in them. Files that might not have to be
195# fully parsed (e.g. in cpan) are parsed enough at this time to find their
196# pod's NAME, and to get a checksum.
197#
198# Those kinds of files are sorted last, but otherwise the pods are parsed with
199# the package coded here, My::Pod::Checker, which is an extension to
200# Pod::Checker that adds some tests and suppresses others that aren't
201# appropriate. The latter module has no provision for capturing diagnostics,
202# so a package, Tie_Array_to_FH, is used to force them to be placed into an
203# array instead of printed.
204#
205# Parsing the files builds up a list of links. The files are gone through
206# again, doing cross-link checking and outputting all saved-up problems with
207# each pod.
208#
209# Sorting the files last that potentially don't need to be fully parsed allows
210# us to not parse them unless there is a link to an internal anchor in them
211# from something that we have already parsed. Keeping checksums allows us to
212# not parse copies of other pods.
213#
214#####################################################
215
216# 1 => Exclude low priority messages that aren't likely to be problems, and
217# has many false positives; higher numbers give more messages.
218my $Warnings_Level = 200;
219
1c01047d
KW
220# perldelta during construction may have place holder links.
221our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta" );
222
16384ac1
KW
223# To see if two pods with the same NAME are actually copies of the same pod,
224# which is not an error, it uses a checksum to save work.
225my $digest_type = "SHA-1";
226
227my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
228my $data_dir = File::Spec->catdir($original_dir, 'porting');
229my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
230my $copy_fh;
231
232my $MAX_LINE_LENGTH = 80; # 80 columns
233my $INDENT = 8; # default nroff indent
234
235# Our warning messages. Better not have [('"] in them, as those are used as
236# delimiters for variable parts of the messages by poderror.
237my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
238my $broken_link = "Apparent broken link";
239my $broken_internal_link = "Apparent internal link is missing its forward slash";
240my $see_not_linked = "? Should you be using L<...> instead of";
241my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
242my $multiple_targets = "There is more than one target";
243my $duplicate_name = "Pod NAME already used";
244my $need_encoding = "Should have =encoding statement because have non-ASCII";
245my $encoding_first = "=encoding must be first command (if present)";
246my $no_name = "There is no NAME";
247my $missing_name_description = "The NAME should have a dash and short description after it";
248
7f8d58fb 249# objects, tests, etc can't be pods, so don't look for them. Also skip
a71e7b2c
KW
250# files output by the patch program. Could also ignore most of .gitignore
251# files, but not all, so don't.
252my $non_pods = qr/ (?: \.
253 (?: [achot] | zip | gz | bz2 | jar | tar | tgz | PL | so
254 | orig | rej | patch # Patch program output
255 | sw[op] | \#.* # Editor droppings
256 )
257 $
258 ) | ~$ # Another editor dropping
259 /x;
16384ac1
KW
260
261
262# Pod::Checker messages to suppress
263my @suppressed_messages = (
264 "(section) in", # Checker is wrong to flag this
265 "multiple occurrence of link target", # We catch independently the ones
266 # that are real problems.
267 "unescaped <>",
268);
269
270sub suppressed {
271 # Returns bool as to if input message is one that is to be suppressed
272
273 my $message = shift;
274 return grep { $message =~ /^\Q$_/i } @suppressed_messages;
275}
276
277{ # Closure to contain a simple subset of test.pl. This is to get rid of the
278 # unnecessary 'failed at' messages that would otherwise be output pointing
279 # to a particular line in this file.
a67b1afa 280
16384ac1
KW
281 my $current_test = 0;
282 my $planned;
283
284 sub plan {
285 my %plan = @_;
286 $planned = $plan{tests};
287 print "1..$planned\n";
288 return;
289 }
290
291 sub ok {
292 my $success = shift;
293 my $message = shift;
294
295 chomp $message;
296
297 $current_test++;
298 print "not " unless $success;
299 print "ok $current_test - $message\n";
300 return;
301 }
302
303 sub skip {
304 my $why = shift;
305 my $n = @_ ? shift : 1;
306 for (1..$n) {
307 $current_test++;
308 print "ok $current_test # skip $why\n";
309 }
310 no warnings 'exiting';
311 last SKIP;
312 }
313
314 sub note {
315 my $message = shift;
316
317 chomp $message;
318
319 print $message =~ s/^/# /mgr;
320 print "\n";
321 return;
322 }
323
324 END {
325 if ($planned && $planned != $current_test) {
326 print STDERR
327 "# Looks like you planned $planned tests but ran $current_test.\n";
328 }
329 }
330}
331
b3fdb838 332# This is to get this to work across multiple file systems, including those
bf8144e1
KW
333# that are not case sensitive. The db is stored in lower case, Un*x style,
334# and all file name comparisons are done that way.
b3fdb838 335sub canonicalize($) {
bf8144e1
KW
336 my $input = shift;
337 my ($volume, $directories, $file)
338 = File::Spec->splitpath(File::Spec->canonpath($input));
339 # Assumes $volume is constant for everything in this directory structure
340 $directories = "" if ! $directories;
341 $file = "" if ! $file;
342 my $output = lc join '/', File::Spec->splitdir($directories), $file;
343 $output =~ s! / /+ !/!gx; # Multiple slashes => single slash
344 return $output;
b3fdb838
KW
345}
346
16384ac1
KW
347
348# List of known potential problems by pod and type.
349my %known_problems;
350
351# Pods given by the keys contain an interior node that is referred to from
352# outside it.
353my %has_referred_to_node;
354
355my $show_counts = 0;
356my $regen = 0;
477100f8 357my $add_link = 0;
16384ac1
KW
358my $show_all = 0;
359
360# Assume that are to skip anything in /cpan
361my $do_upstream_cpan = 0;
362
363while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
364 my $arg = shift @ARGV;
365
366 $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
367 if ($arg eq '-regen') {
368 $regen = 1;
369 }
477100f8
KW
370 elsif ($arg eq '-add_link') {
371 $add_link = 1;
372 }
16384ac1
KW
373 elsif ($arg eq '-cpan') {
374 $do_upstream_cpan = 1;
375 }
376 elsif ($arg eq '-show_all') {
377 $show_all = 1;
378 }
379 elsif ($arg eq '-counts') {
380 $show_counts = 1;
381 }
382 else {
383 die <<EOF;
384Unknown option '$arg'
385
477100f8
KW
386Usage: $0 [ --regen | --cpan | --show_all | FILE ... | --add_link MODULE ... ]\n"
387 --add_link -> Add the MODULE and man page references to the data base
16384ac1 388 --regen -> Regenerate the data file for $0
477100f8 389 --cpan -> Include files in the cpan subdirectory.
16384ac1
KW
390 --show_all -> Show all known potential problems
391 --counts -> Don't test, but give summary counts of the currently
392 existing database
393EOF
394 }
395}
396
397my @files = @ARGV;
398
477100f8
KW
399if (($regen + $show_all + $show_counts + $do_upstream_cpan + $add_link) > 1) {
400 croak "--regen, --show_all, --cpan, --counts, and --add_link are mutually exclusive";
16384ac1
KW
401}
402
403my $has_input_files = @files;
404
405if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
406 croak "--regen, --counts and --cpan can't be used since using specific files";
407}
408
477100f8
KW
409if ($add_link && ! $has_input_files) {
410 croak "--add_link requires at least one module or man page reference";
411}
412
16384ac1
KW
413our %problems; # potential problems found in this run
414
415package My::Pod::Checker { # Extend Pod::Checker
a67b1afa
MM
416 use parent 'Pod::Checker';
417
16384ac1
KW
418 # Uses inside out hash to protect from typos
419 # For new fields, remember to add to destructor DESTROY()
420 my %indents; # Stack of indents from =over's in effect for
421 # current line
422 my %current_indent; # Current line's indent
423 my %filename; # The pod is store in this file
424 my %skip; # is SKIP set for this pod
425 my %in_NAME; # true if within NAME section
426 my %in_begin; # true if within =begin section
427 my %linkable_item; # Bool: if the latest =item is linkable. It isn't
428 # for bullet and number lists
429 my %linkable_nodes; # Pod::Checker adds all =items to its node list,
430 # but not all =items are linkable to
431 my %seen_encoding_cmd; # true if have =encoding earlier
432 my %command_count; # Number of commands seen
433 my %seen_pod_cmd; # true if have =pod earlier
434 my %warned_encoding; # true if already have warned about =encoding
435 # problems
436
437 sub DESTROY {
438 my $addr = Scalar::Util::refaddr $_[0];
439 delete $command_count{$addr};
440 delete $current_indent{$addr};
441 delete $filename{$addr};
442 delete $in_begin{$addr};
443 delete $indents{$addr};
444 delete $in_NAME{$addr};
445 delete $linkable_item{$addr};
446 delete $linkable_nodes{$addr};
447 delete $seen_encoding_cmd{$addr};
448 delete $seen_pod_cmd{$addr};
449 delete $skip{$addr};
450 delete $warned_encoding{$addr};
451 return;
452 }
453
454 sub new {
455 my $class = shift;
456 my $filename = shift;
457
458 my $self = $class->SUPER::new(-quiet => 1,
459 -warnings => $Warnings_Level);
460 my $addr = Scalar::Util::refaddr $self;
461 $command_count{$addr} = 0;
462 $current_indent{$addr} = 0;
463 $filename{$addr} = $filename;
464 $in_begin{$addr} = 0;
465 $in_NAME{$addr} = 0;
466 $linkable_item{$addr} = 0;
467 $seen_encoding_cmd{$addr} = 0;
468 $seen_pod_cmd{$addr} = 0;
469 $warned_encoding{$addr} = 0;
470 return $self;
471 }
472
473 # re's for messages that Pod::Checker outputs
474 my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
475 my $optional_location = qr/ (?: $location )? /xi;
3cb68c65 476 my $line_reference = qr/ [('"]? $optional_location \b line \s+
16384ac1
KW
477 (?: \d+ | EOF | \Q???\E | - )
478 [)'"]? /xi;
479
480 sub poderror { # Called to register a potential problem
481
482 # This adds an extra field to the parent hash, 'parameter'. It is
483 # used to extract the variable parts of a message leaving just the
484 # constant skeleton. This in turn allows the message to be
485 # categorized better, so that it shows up as a single type in our
486 # database, with the specifics of each occurrence not being stored with
487 # it.
488
489 my $self = shift;
490 my $opts = shift;
491
492 my $addr = Scalar::Util::refaddr $self;
493 return if $skip{$addr};
494
495 # Input can be a string or hash. If a string, parse it to separate
496 # out the line number and convert to a hash for easier further
497 # processing
498 my $message;
499 if (ref $opts ne 'HASH') {
500 $message = join "", $opts, @_;
501 my $line_number;
502 if ($message =~ s/\s*($line_reference)//) {
503 ($line_number = $1) =~ s/\s*$optional_location//;
504 }
505 else {
506 $line_number = '???';
507 }
508 $opts = { -msg => $message, -line => $line_number };
509 } else {
510 $message = $opts->{'-msg'};
511
512 }
513
514 $message =~ s/^\d+\s+//;
515 return if main::suppressed($message);
516
517 $self->SUPER::poderror($opts, @_);
518
519 $opts->{parameter} = "" unless $opts->{parameter};
520
521 # The variable parts of the message tend to be enclosed in '...',
522 # "....", or (...). Extract them and put them in an extra field,
523 # 'parameter'. This is trickier because the matching delimiter to a
524 # '(' is its mirror, and not itself. Text::Balanced could be used
525 # instead.
526 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
527 my $delimiter = $1;
528 my $start = $-[0];
529 $delimiter = ')' if $delimiter eq '(';
530
531 # If there is no ending delimiter, don't consider it to be a
532 # variable part. Most likely it is a contraction like "Don't"
533 last unless $message =~ m/\G .+? \Q$delimiter/xg;
534
535 my $length = $+[0] - $start;
536
537 # Get the part up through the closing delimiter
538 my $special = substr($message, $start, $length);
539 $special =~ s/^\s+//; # No leading whitespace
540
541 # And add that variable part to the parameter, while removing it
542 # from the message. This isn't a foolproof way of finding the
543 # variable part. For example '(s)' can occur in e.g.,
544 # 'paragraph(s)'
545 if ($special ne '(s)') {
546 substr($message, $start, $length) = "";
547 pos $message = $start;
548 $opts->{-msg} = $message;
549 $opts->{parameter} .= " " if $opts->{parameter};
550 $opts->{parameter} .= $special;
551 }
552 }
553
554 # Extract any additional line number given. This is often the
555 # beginning location of something whereas the main line number gives
556 # the ending one.
557 if ($message =~ /( $line_reference )/xi) {
558 my $line_ref = $1;
559 while ($message =~ s/\s*\Q$line_ref//) {
560 $opts->{-msg} = $message;
561 $opts->{parameter} .= " " if $opts->{parameter};
562 $opts->{parameter} .= $line_ref;
563 }
564 }
565
b3fdb838 566 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
16384ac1
KW
567 push @{$problems{$filename{$addr}}{$message}}, $opts;
568 #push @{$problems{$self->get_filename}{$message}}, $opts;
569 }
570
571 sub check_encoding { # Does it need an =encoding statement?
572 my ($self, $paragraph, $line_num, $pod_para) = @_;
573
574 # Do nothing if there is an =encoding in the file, or if the line
575 # doesn't require an =encoding, or have already warned.
576 my $addr = Scalar::Util::refaddr $self;
577 return if $seen_encoding_cmd{$addr}
578 || $warned_encoding{$addr}
579 || $paragraph !~ /\P{ASCII}/;
580
581 $warned_encoding{$addr} = 1;
582 my ($file, $line) = $pod_para->file_line;
583 $self->poderror({ -line => $line, -file => $file,
584 -msg => $need_encoding
585 });
586 return;
587 }
588
589 sub verbatim {
590 my ($self, $paragraph, $line_num, $pod_para) = @_;
591 $self->check_encoding($paragraph, $line_num, $pod_para);
592
593 $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
594
2592f3b8
DG
595 my $addr = Scalar::Util::refaddr $self;
596
16384ac1
KW
597 # Pick up the name, since the parent class doesn't in verbatim
598 # NAMEs; so treat as non-verbatim. The parent class only allows one
599 # paragraph in a NAME section, so if there is an extra blank line, it
600 # will trigger a message, but such a blank line is harmless, so skip
601 # in that case.
2592f3b8 602 if ($in_NAME{$addr} && $paragraph =~ /\S/) {
16384ac1
KW
603 $self->textblock($paragraph, $line_num, $pod_para);
604 }
605
606 my @lines = split /^/, $paragraph;
607 for my $i (0 .. @lines - 1) {
2592f3b8
DG
608 if ( my $encoding = $seen_encoding_cmd{$addr} ) {
609 require Encode;
610 $lines[$i] = Encode::decode($encoding, $lines[$i]);
611 }
16384ac1
KW
612 $lines[$i] =~ s/\s+$//;
613 my $indent = $self->get_current_indent;
614 my $exceeds = length(Text::Tabs::expand($lines[$i]))
615 + $indent - $MAX_LINE_LENGTH;
616 next unless $exceeds > 0;
617 my ($file, $line) = $pod_para->file_line;
618 $self->poderror({ -line => $line + $i, -file => $file,
619 -msg => $line_length,
620 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
621 });
622 }
623 }
624
625 sub textblock {
626 my ($self, $paragraph, $line_num, $pod_para) = @_;
627 $self->check_encoding($paragraph, $line_num, $pod_para);
628
629 $self->SUPER::textblock($paragraph, $line_num, $pod_para);
630
631 my ($file, $line) = $pod_para->file_line;
632 my $addr = Scalar::Util::refaddr $self;
633 if ($in_NAME{$addr}) {
634 if (! $self->name) {
635 my $text = $self->interpolate($paragraph, $line_num);
636 if ($text =~ /^\s*(\S+?)\s*$/) {
637 $self->name($1);
638 $self->poderror({ -line => $line, -file => $file,
639 -msg => $missing_name_description,
640 parameter => $1});
641 }
642 }
643 }
644 $paragraph = join " ", split /^/, $paragraph;
645
646 # Matches something that looks like a file name, but is enclosed in
647 # C<...>
648 my $C_path_re = qr{ \b ( C<
649 # exclude regexes and 'OS/2'
650 (?! (?: (?: s | qr | m) / ) | OS/2 > )
651 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
652 }x;
653
654 # If looks like a reference to other documentation by containing the
655 # word 'See' and then a likely pod directive, warn.
656
657 while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
658 ( [^<]*? ) # The not-< excludes nested C<L<...
659 > ) }ixg) {
660 my $construct = $1;
661 my $type = $2;
662 my $interior = $3;
663 if ($interior !~ /$non_pods/
664 && $construct !~ /$C_path_re/g) {
665 $self->poderror({ -line => $line, -file => $file,
666 -msg => $see_not_linked,
667 parameter => $construct
668 });
669 }
670 }
671 while ($paragraph =~ m/$C_path_re/g) {
672 my $construct = $1;
673 $self->poderror({ -line => $line, -file => $file,
674 -msg => $C_with_slash,
675 parameter => $construct
676 });
677 }
678 return;
679 }
680
681 sub command {
682 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
683 my $addr = Scalar::Util::refaddr $self;
684 if ($cmd eq "pod") {
685 $seen_pod_cmd{$addr}++;
686 }
687 elsif ($cmd eq "encoding") {
688 my ($file, $line) = $pod_para->file_line;
2592f3b8 689 $seen_encoding_cmd{$addr} = $paragraph; # for later decoding
16384ac1
KW
690 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
691 $self->poderror({ -line => $line, -file => $file,
692 -msg => $encoding_first
693 });
694 }
695 }
696 $self->check_encoding($paragraph, $line_num, $pod_para);
697
698 # Pod::Check treats all =items as linkable, but the bullet and
699 # numbered lists really aren't. So keep our own list. This has to be
700 # processed before SUPER is called so that the list is started before
701 # the rest of it gets parsed.
702 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
703 $linkable_item{$addr} = ($paragraph !~ / ^ \s*
704 (?: [*]
705 | \d+ \.? (?: \$ | \s+ )
706 )/x)
707 ? 1
708 : 0;
a67b1afa 709
16384ac1
KW
710 }
711 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
712
713 $command_count{$addr}++;
714
715 $in_NAME{$addr} = 0; # Will change to 1 below if necessary
716 $in_begin{$addr} = 0; # ibid
717 if ($cmd eq 'over') {
718 my $text = $self->interpolate($paragraph, $line_num);
719 my $indent = 4; # default
720 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
721 push @{$indents{$addr}}, $indent;
722 $current_indent{$addr} += $indent;
723 }
724 elsif ($cmd eq 'back') {
725 if (@{$indents{$addr}}) {
726 $current_indent{$addr} -= pop @{$indents{$addr}};
727 }
728 else {
729 # =back without corresponding =over, but should have
730 # warned already
731 $current_indent{$addr} = 0;
732 }
733 }
734 elsif ($cmd =~ /^head/) {
735 if (! $in_begin{$addr}) {
736
737 # If a particular formatter, then this command doesn't really
738 # apply
739 $current_indent{$addr} = 0;
740 undef @{$indents{$addr}};
741 }
742
743 my $text = $self->interpolate($paragraph, $line_num);
744 $in_NAME{$addr} = 1 if $cmd eq 'head1'
745 && $text && $text =~ /^NAME\b/;
746 }
747 elsif ($cmd eq 'begin') {
748 $in_begin{$addr} = 1;
749 }
750
751 return;
752 }
753
754 sub hyperlink {
755 my $self = shift;
756
757 # If the hyperlink is to an interior node of another page, save it
758 # so that we can see if we need to parse normally skipped files.
759 $has_referred_to_node{$_[0][1]{'-page'}} = 1
760 if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
761 return $self->SUPER::hyperlink($_[0]);
762 }
763
764 sub node {
765 my $self = shift;
766 my $text = $_[0];
767 if($text) {
768 $text =~ s/\s+$//s; # strip trailing whitespace
769 $text =~ s/\s+/ /gs; # collapse whitespace
770 my $addr = Scalar::Util::refaddr $self;
771 push(@{$linkable_nodes{$addr}}, $text) if
772 ! $current_indent{$addr}
773 || $linkable_item{$addr};
774 }
775 return $self->SUPER::node($_[0]);
776 }
777
778 sub get_current_indent {
779 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
780 }
781
782 sub get_filename {
783 return $filename{Scalar::Util::refaddr $_[0]};
784 }
785
786 sub linkable_nodes {
787 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
788 return undef unless $linkables;
789 return @$linkables;
790 }
791
792 sub get_skip {
793 return $skip{Scalar::Util::refaddr $_[0]} // 0;
794 }
795
796 sub set_skip {
797 my $self = shift;
798 $skip{Scalar::Util::refaddr $self} = shift;
799
800 # If skipping, no need to keep the problems for it
801 delete $problems{$self->get_filename};
802 return;
803 }
804}
805
806package Tie_Array_to_FH { # So printing actually goes to an array
807
808 my %array;
809
810 sub TIEHANDLE {
811 my $class = shift;
812 my $array_ref = shift;
813
814 my $self = bless \do{ my $anonymous_scalar }, $class;
815 $array{Scalar::Util::refaddr $self} = $array_ref;
816
817 return $self;
818 }
819
820 sub PRINT {
a67b1afa 821 my $self = shift;
16384ac1
KW
822 push @{$array{Scalar::Util::refaddr $self}}, @_;
823 return 1;
824 }
a67b1afa
MM
825}
826
69f6a9a1 827
16384ac1
KW
828my %filename_to_checker; # Map a filename to it's pod checker object
829my %id_to_checker; # Map a checksum to it's pod checker object
830my %nodes; # key is filename, values are nodes in that file.
831my %nodes_first_word; # same, but value is first word of each node
832my %valid_modules; # List of modules known to exist outside us.
833my %digests; # checksums of files, whose names are the keys
834my %filename_to_pod; # Map a filename to its pod NAME
835my %files_with_unknown_issues;
836my %files_with_fixes;
69f6a9a1 837
16384ac1 838my $data_fh;
be39e7f8 839open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
69f6a9a1 840
16384ac1
KW
841my %counts; # For --counts param, count of each issue type
842my %suppressed_files; # Files with at least one issue type to suppress
843
477100f8
KW
844
845if ($add_link) {
846 $copy_fh = open_new($known_issues);
847 my @existing_db = <$data_fh>;
848 my_safer_print($copy_fh, @existing_db);
849
850 foreach my $module (@files) {
851 die "\"$module\" does not look like a module or man page"
852 # Must look like (A or A::B or A::B::C ..., or foo(3C)
853 if $module !~ /^ (?: \w+ (?: :: \w+ )* | \w+ \( \d \w* \) ) $/x;
854 $module .= "\n";
855 next if grep { $module eq $_ } @existing_db;
856 my_safer_print($copy_fh, $module);
857 }
858 close_and_rename($copy_fh);
859 exit;
860}
861
16384ac1 862while (<$data_fh>) { # Read the data base
69f6a9a1 863 chomp;
16384ac1
KW
864 next if /^\s*(?:#|$)/; # Skip comment and empty lines
865 if (/\t/) {
866 next if $show_all;
867
868 # Keep track of counts of each issue type for each file
869 my ($filename, $message, $count) = split /\t/;
870 $known_problems{$filename}{$message} = $count;
871
872 if ($show_counts) {
873 if ($count < 0) { # -1 means to suppress this issue type
874 $suppressed_files{$filename} = $filename;
875 }
876 else {
877 $counts{$message} += $count;
878 }
879 }
880 }
881 else { # Lines without a tab are modules known to be valid
882 $valid_modules{$_} = 1
883 }
884}
885close $data_fh;
886
887if ($show_counts) {
888 my $total = 0;
889 foreach my $message (sort keys %counts) {
890 $total += $counts{$message};
891 note(Text::Tabs::expand("$counts{$message}\t$message"));
892 }
893 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
894 if (%suppressed_files) {
895 note("\nFiles that have all messages of at least one type suppressed:");
896 note(join ",", keys %suppressed_files);
897 }
898 exit 0;
899}
900
901
902my %excluded_files = (
903 "lib/unicore/mktables" => 1,
904 "Porting/perldelta_template.pod" => 1,
905 "autodoc.pl" => 1,
906 "configpm" => 1,
907 "miniperl" => 1,
908 "perl" => 1,
e678c294
KW
909
910 # It would be nice if we didn't have to skip this,
911 # but the errors in it are too variable.
912 "pod/perltoc.pod" => 1,
16384ac1
KW
913 );
914
b3fdb838
KW
915# Convert to more generic form.
916foreach my $file (keys %excluded_files) {
03ca349a
KW
917 delete $excluded_files{$file};
918 $excluded_files{canonicalize($file)} = 1;
b3fdb838
KW
919}
920
16384ac1
KW
921# re to match files that are to be parsed only if there is an internal link
922# to them. It does not include cpan, as whether those are parsed depends
923# on a switch. Currently, only the stable perldelta.pod's are included.
924# These all have characters between 'perl' and 'delta'. (Actually the
925# currently developed one matches as well, but is a duplicate of
926# perldelta.pod, so can be skipped, so fine for it to match this.
927my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
928
929{ # Closure
930 my $first_time = 1;
931
932 sub output_thanks ($$$$) { # Called when an issue has been fixed
933 my $filename = shift;
934 my $original_count = shift;
935 my $current_count = shift;
936 my $message = shift;
937
938 $files_with_fixes{$filename} = 1;
939 my $return;
940 my $fixed_count = $original_count - $current_count;
941 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
942 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
943 my $diff;
944 if ($message) {
945 $diff = <<EOF;
946There were $original_count occurrences (now $current_count) in this pod of type
947"$message",
948EOF
949 } else {
950 $diff = <<EOF;
951There are no longer any problems found in this pod!
952EOF
953 }
954
955 if ($first_time) {
956 $first_time = 0;
957 $return = <<EOF;
958Thanks for fixing $a_problem!
959$diff
960Now you must teach $0 that this was fixed.
961EOF
962 }
963 else {
964 $return = <<EOF
965Thanks for fixing $another_problem.
966$diff
967EOF
968 }
969
970 return $return;
971 }
972}
973
974sub my_safer_print { # print, with error checking for outputting to db
975 my ($fh, @lines) = @_;
976
977 if (! print $fh @lines) {
978 my $save_error = $!;
979 close($fh);
980 die "Write failure: $save_error";
981 }
982}
983
984sub extract_pod { # Extracts just the pod from a file
985 my $filename = shift;
986
987 my @pod;
988
989 # Arrange for the output of Pod::Parser to be collected in an array we can
990 # look at instead of being printed
991 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
be39e7f8 992 open my $in_fh, '<:bytes', $filename
763df156
KW
993
994 # The file should already have been opened once to get here, so if
77b8b9ad
KW
995 # fails, just die. It's possible that a transitory file containing a
996 # pod would get here, but not bothering to add code for that very
997 # unlikely event.
16384ac1
KW
998 or die "Can't open '$filename': $!\n";
999
1000 my $parser = Pod::Parser->new();
1001 $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
1002 close $in_fh;
1003
1004 return join "", @pod
1005}
1006
1007my $digest = Digest->new($digest_type);
1008
1009sub is_pod_file {
c1dcaaab
KW
1010 # If $_ is a pod file, add it to the lists and do other prep work.
1011
16384ac1
KW
1012 if (-d $_) {
1013 # Don't look at files in directories that are for tests, nor those
1014 # beginning with a dot
1015 if ($_ eq 't' || $_ =~ /^\../) {
1016 $File::Find::prune = 1;
1017 }
1018 return;
1019 }
1020
1021 return if $_ =~ /^\./; # No hidden Unix files
1022 return if $_ =~ $non_pods;
1023
1024 my $filename = $File::Find::name;
1025
1026 # Assumes that the path separator is exactly one character.
1027 $filename =~ s/^\..//;
1028
b3fdb838 1029 return if $excluded_files{canonicalize($filename)};
16384ac1 1030
144a708b
NC
1031 my $contents = do {
1032 local $/;
763df156
KW
1033 my $candidate;
1034 if (! open $candidate, '<:bytes', $_) {
1035
77b8b9ad
KW
1036 # If a transitory file was found earlier, the open could fail
1037 # legitimately and we just skip the file; also skip it if it is a
1038 # broken symbolic link, as it is probably just a build problem;
1039 # certainly not a file that we would want to check the pod of.
1040 # Otherwise fail it here and no reason to process it further.
1041 # (But the test count will be off too)
1042 ok(0, "Can't open '$filename': $!")
1043 if -e $filename && ! -l $filename;
763df156
KW
1044 return;
1045 }
144a708b
NC
1046 <$candidate>;
1047 };
16384ac1
KW
1048
1049 # If the file is a .pm or .pod, having any initial '=' on a line is
1050 # grounds for testing it. Otherwise, require a head1 NAME line to view it
1051 # as a potential pod
144a708b
NC
1052 if ($filename =~ /\.(?:pm|pod)/) {
1053 return unless $contents =~ /^=/m;
1054 } else {
1055 return unless $contents =~ /^=head1 +NAME/m;
16384ac1 1056 }
144a708b
NC
1057
1058 # Here, we know that the file is a pod. Add it to the list of files
1059 # to check and create a checker object for it.
1060
1061 push @files, $filename;
1062 my $checker = My::Pod::Checker->new($filename);
1063 $filename_to_checker{$filename} = $checker;
1064
1065 # In order to detect duplicate pods and only analyze them once, we
1066 # compute checksums for the file, so don't have to do an exact
1067 # compare. Note that if the pod is just part of the file, the
1068 # checksums can differ for the same pod. That special case is handled
1069 # later, since if the checksums of the whole file are the same, that
1070 # case won't even come up. We don't need the checksums for files that
1071 # we parse only if there is a link to its interior, but we do need its
1072 # NAME, which is also retrieved in the code below.
1073
1074 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
1075 | $only_for_interior_links_re
1076 /x) {
1077 $digest->add($contents);
1078 $digests{$filename} = $digest->digest;
1079
1080 # lib files aren't analyzed if they are duplicates of files copied
1081 # there from some other directory. But to determine this, we need
1082 # to know their NAMEs. We might as well find the NAME now while
1083 # the file is open. Similarly, cpan files aren't analyzed unless
1084 # we're analyzing all of them, or this particular file is linked
1085 # to by a file we are analyzing, and thus we will want to verify
1086 # that the target exists in it. We need to know at least the NAME
1087 # to see if it's worth analyzing, or so we can determine if a lib
1088 # file is a copy of a cpan one.
1089 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
16384ac1 1090 | $only_for_interior_links_re
144a708b
NC
1091 }x) {
1092 if ($contents =~ /^=head1 +NAME.*/mg) {
1093 # The NAME is the first non-spaces on the line up to a
1094 # comma, dash or end of line. Otherwise, it's invalid and
1095 # this pod doesn't have a legal name that we're smart
1096 # enough to find currently. But the parser will later
1097 # find it if it thinks there is a legal name, and set the
1098 # name
1099 if ($contents =~ /\G # continue from the line after =head1
1100 \s* # ignore any empty lines
1101 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1102 my $name = $1;
1103 $checker->name($name);
1104 $id_to_checker{$name} = $checker
1105 if $filename =~ m{^cpan/};
16384ac1
KW
1106 }
1107 }
144a708b
NC
1108 elsif ($filename =~ m{^cpan/}) {
1109 $id_to_checker{$digests{$filename}} = $checker;
1110 }
16384ac1
KW
1111 }
1112 }
c1dcaaab
KW
1113
1114 return;
16384ac1
KW
1115} # End of is_pod_file()
1116
477100f8
KW
1117# Start of real code that isn't processing the command line (except the
1118# db is read in above, as is processing of the --add_link option).
16384ac1
KW
1119# Here, @files contains list of files on the command line. If have any of
1120# these, unconditionally test them, and show all the errors, even the known
1121# ones, and, since not testing other pods, don't do cross-pod link tests.
1122# (Could add extra code to do cross-pod tests for the ones in the list.)
477100f8 1123
16384ac1
KW
1124if ($has_input_files) {
1125 undef %known_problems;
1126 $do_upstream_cpan = 1; # In case one of the inputs is from cpan
1127
1128}
1129else { # No input files -- go find all the possibilities.
1130 if ($regen) {
1131 $copy_fh = open_new($known_issues);
1132 note("Regenerating $known_issues, please be patient...");
1133 print $copy_fh <<END;
1134# This file is the data file for $0.
1135# There are three types of lines.
1136# Comment lines are white-space only or begin with a '#', like this one. Any
1137# changes you make to the comment lines will be lost when the file is
1138# regen'd.
1139# Lines without tab characters are simply NAMES of pods that the program knows
1140# will have links to them and the program does not check if those links are
1141# valid.
1142# All other lines should have three fields, each separated by a tab. The
1143# first field is the name of a pod; the second field is an error message
1144# generated by this program; and the third field is a count of how many
1145# known instances of that message there are in the pod. -1 means that the
1146# program can expect any number of this type of message.
1147END
1148 }
1149
1150 # Move to the directory above us, but have to adjust @INC to account for
1151 # that.
1152 s{^\.\./lib$}{lib} for @INC;
1153 chdir File::Spec->updir;
1154
1155 # And look in this directory and all its subdirectories
1156 find( \&is_pod_file, '.');
1157
1158 # Add ourselves to the test
b3fdb838 1159 push @files, "t/porting/podcheck.t";
16384ac1
KW
1160}
1161
1162# Now we know how many tests there will be.
1163plan (tests => scalar @files) if ! $regen;
1164
1165
1166 # Sort file names so we get consistent results, and to put cpan last,
1167 # preceeded by the ones that we don't generally parse. This is because both
1168 # these classes are generally parsed only if there is a link to the interior
1169 # of them, and we have to parse all others first to guarantee that they don't
1170 # have such a link. 'lib' files come just before these, as some of these are
1171 # duplicates of others. We already have figured this out when gathering the
1172 # data as a special case for all such files, but this, while unnecessary,
1173 # puts the derived file last in the output. 'readme' files come before those,
1174 # as those also could be duplicates of others, which are considered the
1175 # primary ones. These currently aren't figured out when gathering data, so
1176 # are done here.
1177 @files = sort { if ($a =~ /^cpan/) {
1178 return 1 if $b !~ /^cpan/;
1179 return $a cmp $b;
1180 }
1181 elsif ($b =~ /^cpan/) {
1182 return -1;
1183 }
1184 elsif ($a =~ /$only_for_interior_links_re/) {
1185 return 1 if $b !~ /$only_for_interior_links_re/;
1186 return $a cmp $b;
1187 }
1188 elsif ($b =~ /$only_for_interior_links_re/) {
1189 return -1;
1190 }
1191 elsif ($a =~ /^lib/) {
1192 return 1 if $b !~ /^lib/;
1193 return $a cmp $b;
1194 }
1195 elsif ($b =~ /^lib/) {
1196 return -1;
1197 } elsif ($a =~ /\breadme\b/i) {
1198 return 1 if $b !~ /\breadme\b/i;
1199 return $a cmp $b;
1200 }
1201 elsif ($b =~ /\breadme\b/i) {
1202 return -1;
1203 }
1204 else {
1205 return lc $a cmp lc $b;
1206 }
1207 }
1208 @files;
1209
1210# Now go through all the files and parse them
1211foreach my $filename (@files) {
1212 my $parsed = 0;
1213 note("parsing $filename") if DEBUG;
1214
1215 # We may have already figured out some things in the process of generating
1216 # the file list. If so, have a $checker object already. But if not,
1217 # generate one now.
1218 my $checker = $filename_to_checker{$filename};
1219 if (! $checker) {
1220 $checker = My::Pod::Checker->new($filename);
1221 $filename_to_checker{$filename} = $checker;
1222 }
1223
1224 # We have set the name in the checker object if there is a possibility
1225 # that no further parsing is necessary, but otherwise do the parsing now.
1226 if (! $checker->name) {
1227 $parsed = 1;
1228 $checker->parse_from_file($filename, undef);
1229 }
1230
1231 if ($checker->num_errors() < 0) { # Returns negative if not a pod
1232 $checker->set_skip("$filename is not a pod");
1233 }
1234 else {
1235
1236 # Here, is a pod. See if it is one that has already been tested,
1237 # or should be tested under another directory. Use either its NAME
1238 # if it has one, or a checksum if not.
1239 my $name = $checker->name;
1240 my $id;
1241
1242 if ($name) {
1243 $id = $name;
1244 }
1245 else {
1246 my $digest = Digest->new($digest_type);
1247 $digest->add(extract_pod($filename));
1248 $id = $digest->digest;
1249 }
1250
1251 # If there is a match for this pod with something that we've already
1252 # processed, don't process it, and output why.
1253 my $prior_checker;
1254 if (defined ($prior_checker = $id_to_checker{$id})
1255 && $prior_checker != $checker) # Could have defined the checker
1256 # earlier without pursuing it
1257 {
1258
1259 # If the pods are identical, then it's just a copy, and isn't an
1260 # error. First use the checksums we have already computed to see
1261 # if the entire files are identical, which means that the pods are
1262 # identical too.
1263 my $prior_filename = $prior_checker->get_filename;
1264 my $same = (! $name
1265 || ($digests{$prior_filename}
1266 && $digests{$filename}
1267 && $digests{$prior_filename} eq $digests{$filename}));
1268
1269 # If they differ, it could be that the files differ for some
1270 # reason, but the pods they contain are identical. Extract the
1271 # pods and do the comparisons on just those.
1272 if (! $same && $name) {
1273 $same = extract_pod($prior_filename) eq extract_pod($filename);
1274 }
1275
1276 if ($same) {
1277 $checker->set_skip("The pod of $filename is a duplicate of "
1278 . "the pod for $prior_filename");
1279 } elsif ($prior_filename =~ /\breadme\b/i) {
1280 $checker->set_skip("$prior_filename is a README apparently for $filename");
1281 } elsif ($filename =~ /\breadme\b/i) {
1282 $checker->set_skip("$filename is a README apparently for $prior_filename");
1283 } else { # Here have two pods with identical names that differ
1284 $prior_checker->poderror(
1285 { -msg => $duplicate_name,
1286 -line => "???",
1287 parameter => "'$filename' also has NAME '$name'"
1288 });
1289 $checker->poderror(
1290 { -msg => $duplicate_name,
1291 -line => "???",
1292 parameter => "'$prior_filename' also has NAME '$name'"
1293 });
1294
1295 # Changing the names helps later.
1296 $prior_checker->name("$name version arbitrarily numbered 1");
1297 $checker->name("$name version arbitrarily numbered 2");
1298 }
1299
1300 # In any event, don't process this pod that has the same name as
1301 # another.
1302 next;
1303 }
1304
1305 # A unique pod.
1306 $id_to_checker{$id} = $checker;
1307
1308 my $parsed_for_links = ", but parsed for its interior links";
1309 if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1310 || $filename =~ $only_for_interior_links_re)
1311 {
1312 if ($filename =~ /^cpan/) {
1313 $checker->set_skip("CPAN is upstream for $filename");
1314 }
1315 elsif ($filename =~ /perl\d+delta/) {
1316 $checker->set_skip("$filename is a stable perldelta");
1317 }
1318 else {
1319 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1320 }
1321
1322 if ($name && $has_referred_to_node{$name}) {
1323 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1324 }
1325 }
1326
1327 # Need a name in order to process it, because not meaningful
1328 # otherwise, and also can't test links to this without a name.
1329 if (!defined $name) {
1330 $checker->poderror( { -msg => $no_name,
1331 -line => '???'
1332 });
1333 next;
1334 }
1335
1336 # For skipped files, just get its NAME
1337 my $skip;
1338 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1339 {
1340 $checker->node($name) if $name;
1341 }
1342 else {
1343 $checker->parse_from_file($filename, undef) if ! $parsed;
1344 }
1345
1346 # Go through everything in the file that could be an anchor that
1347 # could be a link target. Count how many there are of the same name.
1348 foreach my $node ($checker->linkable_nodes) {
1349 next if ! $node; # Can be empty is like '=item *'
1350 if (exists $nodes{$name}{$node}) {
1351 $nodes{$name}{$node}++;
1352 }
1353 else {
1354 $nodes{$name}{$node} = 1;
1355 }
1356
1357 # Experiments have shown that cpan search can figure out the
1358 # target of a link even if the exact wording is incorrect, as long
1359 # as the first word is. This happens frequently in perlfunc.pod,
1360 # where the link will be just to the function, but the target
1361 # entry also includes parameters to the function.
1362 my $first_word = $node;
1363 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1364 $nodes_first_word{$name}{$first_word} = $node;
1365 }
1366 }
1367 $filename_to_pod{$filename} = $name;
1368 }
1369}
1370
1371# Here, all files have been parsed, and all links and link targets are stored.
1372# Now go through the files again and see which don't have matches.
1373if (! $has_input_files) {
1374 foreach my $filename (@files) {
1375 next if $filename_to_checker{$filename}->get_skip;
1376 my $checker = $filename_to_checker{$filename};
1377 foreach my $link ($checker->hyperlink) {
1378 my $linked_to_page = $link->[1]->page;
1379 next unless $linked_to_page; # intra-file checks are handled by std
1380 # Pod::Checker
1381
1382 # Initialize the potential message.
1383 my %problem = ( -msg => $broken_link,
1384 -line => $link->[0],
1385 parameter => "to \"$linked_to_page\"",
1386 );
1387
1388 # See if we have found the linked-to_file in our parse
1389 if (exists $nodes{$linked_to_page}) {
1390 my $node = $link->[1]->node;
1391
1392 # If link is only to the page-level, already have it
1393 next if ! $node;
1394
1395 # Transform pod language to what we are expecting
1396 $node =~ s,E<sol>,/,g;
1397 $node =~ s/E<verbar>/|/g;
1398
1399 # If link is to a node that exists in the file, is ok
1400 if ($nodes{$linked_to_page}{$node}) {
1401
1402 # But if the page has multiple targets with the same name,
1403 # it's ambiguous which one this should be to.
1404 if ($nodes{$linked_to_page}{$node} > 1) {
1405 $problem{-msg} = $multiple_targets;
1406 $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1407 $checker->poderror(\%problem);
1408 }
1409 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1410
1411 # Here the link target was not found, either exactly or to
1412 # the first word. Is an error.
1413 $problem{parameter} =~ s,"$,/$node",;
1414 $checker->poderror(\%problem);
1415 }
1416
1417 } # Linked-to-file not in parse; maybe is in exception list
1418 elsif (! exists $valid_modules{$link->[1]->page}) {
1419
1420 # Here, is a link to a target that we can't find. Check if
1421 # there is an internal link on the page with the target name.
1422 # If so, it could be that they just forgot the initial '/'
1c01047d
KW
1423 # But perldelta is handled specially: only do this if the
1424 # broken link isn't one of the known bad ones (that are
1425 # placemarkers and should be removed for the final)
1426 my $NAME = $filename_to_pod{$filename};
1427 if (! defined $NAME) {
1428 $checker->poderror(\%problem);
1429 }
1430 elsif ($NAME ne "perldelta"
1431 || ! grep { $linked_to_page eq $_ } @perldelta_ignore_links)
16384ac1 1432 {
1c01047d
KW
1433 if ($nodes{$NAME}{$linked_to_page}) {
1434 $problem{-msg} = $broken_internal_link;
1435 }
1436 $checker->poderror(\%problem);
16384ac1 1437 }
16384ac1
KW
1438 }
1439 }
1440 }
1441}
1442
1443# If regenerating the data file, start with the modules for which we don't
1444# check targets
1445if ($regen) {
1446 foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1447 my_safer_print($copy_fh, $_, "\n");
1448 }
1449}
1450
1451# Now ready to output the messages.
1452foreach my $filename (@files) {
1453 my $test_name = "POD of $filename";
b3fdb838 1454 my $canonical = canonicalize($filename);
16384ac1
KW
1455 SKIP: {
1456 my $skip = $filename_to_checker{$filename}->get_skip // "";
1457
1458 if ($regen) {
1459 foreach my $message ( sort keys %{$problems{$filename}}) {
1460 my $count;
1461
1462 # Preserve a negative setting.
b3fdb838
KW
1463 if ($known_problems{$canonical}{$message}
1464 && $known_problems{$canonical}{$message} < 0)
16384ac1 1465 {
b3fdb838 1466 $count = $known_problems{$canonical}{$message};
16384ac1
KW
1467 }
1468 else {
1469 $count = @{$problems{$filename}{$message}};
1470 }
b3fdb838 1471 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
16384ac1
KW
1472 }
1473 next;
1474 }
1475
1476 skip($skip, 1) if $skip;
1477 my @diagnostics;
1478 my $indent = ' ';
1479
16384ac1
KW
1480 my $total_known = 0;
1481 foreach my $message ( sort keys %{$problems{$filename}}) {
b3fdb838
KW
1482 $known_problems{$canonical}{$message} = 0
1483 if ! $known_problems{$canonical}{$message};
16384ac1
KW
1484 my $diagnostic = "";
1485 my $problem_count = scalar @{$problems{$filename}{$message}};
1486 $total_known += $problem_count;
b3fdb838
KW
1487 next if $known_problems{$canonical}{$message} < 0;
1488 if ($problem_count > $known_problems{$canonical}{$message}) {
16384ac1
KW
1489
1490 # Here we are about to output all the messages for this type,
1491 # subtract back this number we previously added in.
1492 $total_known -= $problem_count;
1493
1494 $diagnostic .= $indent . $message;
1495 if ($problem_count > 2) {
1496 $diagnostic .= " ($problem_count occurrences)";
1497 }
1498 foreach my $problem (@{$problems{$filename}{$message}}) {
1499 $diagnostic .= " " if $problem_count == 1;
1500 $diagnostic .= "\n$indent$indent";
1501 $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1502 $diagnostic .= " near line $problem->{-line}";
1503 $diagnostic .= " $problem->{comment}" if $problem->{comment};
1504 }
1505 $diagnostic .= "\n";
1506 $files_with_unknown_issues{$filename} = 1;
b3fdb838
KW
1507 } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1508 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
16384ac1
KW
1509 }
1510 push @diagnostics, $diagnostic if $diagnostic;
1511 }
1512
09ea063a
KW
1513 # The above loop has output messages where there are current potential
1514 # issues. But it misses where there were some that have been entirely
1515 # fixed. For those, we need to look through the old issues
b3fdb838 1516 foreach my $message ( sort keys %{$known_problems{$canonical}}) {
09ea063a 1517 next if $problems{$filename}{$message};
b3fdb838
KW
1518 next if ! $known_problems{$canonical}{$message};
1519 next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1520 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
09ea063a
KW
1521 push @diagnostics, $diagnostic if $diagnostic;
1522 }
1523
16384ac1
KW
1524 my $output = "POD of $filename";
1525 $output .= ", excluding $total_known not shown known potential problems"
1526 if $total_known;
1527 ok(@diagnostics == 0, $output);
1528 if (@diagnostics) {
1529 note(join "", @diagnostics,
477100f8 1530 "See end of this test output for your options on silencing this");
16384ac1
KW
1531 }
1532 }
1533}
1534
1535my $how_to = <<EOF;
1536 run this test script by hand, using the following formula (on
1537 Un*x-like machines):
1538 cd t
1539 ./perl -I../lib porting/podcheck.t --regen
1540EOF
1541
1542if (%files_with_unknown_issues) {
1543 my $were_count_files = scalar keys %files_with_unknown_issues;
1544 $were_count_files = ($were_count_files == 1)
1545 ? "was $were_count_files file"
1546 : "were $were_count_files files";
1547 my $message = <<EOF;
1548
1549HOW TO GET THIS .t TO PASS
1550
477100f8
KW
1551There $were_count_files that had new potential problems identified.
1552Some of them may be real, and some of them may be because this program
1553isn't as smart as it likes to think it is. You can teach this program
1554to ignore the issues it has identified, and hence pass, by doing the
1555following:
16384ac1 1556
477100f8
KW
15571) If a problem is about a link to an unknown module or man page that
1558 you know exists, re-run the command something like:
1559 ./perl -I../lib porting/podcheck.t --add_link MODULE man_page ...
1560 (MODULEs should look like Foo::Bar, and man_pages should look like
1561 bar(3c); don't do this for a module or man page that you aren't sure
1562 about; instead treat as another type of issue and follow the
1563 instructions below.)
16384ac1
KW
1564
15652) For other issues, decide if each should be fixed now or not. Fix the
1566 ones you decided to, and rerun this test to verify that the fixes
1567 worked.
1568
15693) If there remain potential problems that you don't plan to fix right
1570 now (or aren't really problems),
1571$how_to
477100f8
KW
1572 That should cause all current potential problems to be accepted by
1573 the program, so that the next time it runs, they won't be flagged.
16384ac1
KW
1574EOF
1575 if (%files_with_fixes) {
1576 $message .= " This step will also take care of the files that have fixes in them\n";
1577 }
1578
1579 $message .= <<EOF;
1580 For a few files, such as perltoc, certain issues will always be
1581 expected, and more of the same will be added over time. For those,
1582 before you do the regen, you can edit
1583 $known_issues
1584 and find the entry for the module's file and specific error message,
1585 and change the count of known potential problems to -1.
1586EOF
1587
1588 note($message);
1589} elsif (%files_with_fixes) {
1590 note(<<EOF
1591To teach this test script that the potential problems have been fixed,
1592$how_to
1593EOF
1594 );
1595}
1596
1597if ($regen) {
1598 chdir $original_dir || die "Can't change directories to $original_dir";
1599 close_and_rename($copy_fh);
1600}