t/porting/podcheck.t

   1 #!/usr/bin/perl -w
   2
   3 use strict;
   4 use warnings;
   5 use feature 'unicode_strings';
   6
   7 use Carp;
   8 use Digest;
   9 use File::Find;
  10 use File::Spec;
  11 use Scalar::Util;
  12 use Text::Tabs;
  13
  14 BEGIN {
  15     require '../regen/regen_lib.pl';
  16 }
  17
  18 sub DEBUG { 0 };
  19
  20 =pod
  21
  22 =head1 NAME
  23
  24 podcheck.t - Look for possible problems in the Perl pods
  25
  26 =head1 SYNOPSIS
  27
  28  cd t
  29  ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
  30                                                             [ FILE ...]
  31  ./perl -I../lib porting/podcheck.t --regen
  32
  33 =head1 DESCRIPTION
  34
  35 podcheck.t is an extension of Pod::Checker.  It looks for pod errors and
  36 potential errors in the files given as arguments, or if none specified, in all
  37 pods in the distribution workspace, except those in the cpan directory (unless
  38 C<--cpan> is specified).  It does additional checking beyond that done by
  39 Pod::Checker, and keeps a database of known potential problems, and will
  40 fail a pod only if the number of such problems differs from that given in the
  41 database.  It also suppresses the C<(section) deprecated> message from
  42 Pod::Checker, since specifying the man page section number is quite proper to do.
  43
  44 The additional checks it makes are:
  45
  46 =over
  47
  48 =item Cross-pod link checking
  49
  50 Pod::Checker verifies that links to an internal target in a pod are not
  51 broken.  podcheck.t extends that (when called without FILE arguments) to
  52 external links.  It does this by gathering up all the possible targets in the
  53 workspace, and cross-checking them.  The database has a list of known targets
  54 outside the workspace, so podcheck.t will not raise a warning for
  55 using those.  It also checks that a non-broken link points to just one target.
  56 (The destination pod could have two targets with the same name.)
  57
  58 =item An internal link that isn't so specified
  59
  60 If a link is broken, but there is an existing internal target of the same
  61 name, it is likely that the internal target was meant, and the C<"/"> is
  62 missing from the C<LE<lt>E<gt>> pod command.
  63
  64 =item Verbatim paragraphs that wrap in an 80 column window
  65
  66 It's annoying to have lines wrap when displaying pod documentation in a
  67 terminal window.  This checks that all such lines fit, and for those that
  68 don't, it tells you how much needs to be cut in order to fit.  However,
  69 if you're fixing these, keep in mind that some terminal/pager combinations
  70 require really a maximum of 79 or 78 columns to display properly.
  71
  72 Often, the easiest thing to do to gain space for these is to lower the indent
  73 to just one space.
  74
  75 =item Missing or duplicate NAME or missing NAME short description
  76
  77 A pod can't be linked to unless it has a unique name.
  78 And a NAME should have a dash and short description after it.
  79
  80 =item =encoding statement issues
  81
  82 This indicates if an C<=encoding> statement should be present, or moved to the
  83 front of the pod.
  84
  85 =item Items that perhaps should be links
  86
  87 There are mentions of apparent files in the pods that perhaps should be links
  88 instead, using C<LE<lt>...E<gt>>
  89
  90 =item Items that perhaps should be C<FE<lt>...E<gt>>
  91
  92 What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
  93 C<FE<lt>...E<gt>> mark-up instead.
  94
  95 =back
  96
  97 A number of issues raised by podcheck.t and by the base Pod::Checker are not
  98 really problems, but merely potential problems.  After inspecting them and
  99 deciding that they aren't real problems, it is possible to shut up this program
 100 about them, unlike base Pod::Checker.  To do this, call podcheck.t with the
 101 C<--regen> option to regenerate the database.  This tells it that all existing
 102 issues are to not be mentioned again.
 103
 104 This isn't fool-proof.  The database merely keeps track of the number of these
 105 potential problems of each type for each pod.  If a new problem of a given
 106 type is introduced into the pod, podcheck.t will spit out all of them.  You
 107 then have to figure out which is the new one, and should it be changed or not.
 108 But doing it this way insulates the database from having to keep track of line
 109 numbers of problems, which may change, or the exact wording of each problem
 110 which might also change without affecting whether it is a problem or not.
 111
 112 Also, if the count of potential problems of a given type for a pod decreases,
 113 the database must be regenerated so that it knows the new number.  The program
 114 gives instructions when this happens.
 115
 116 There is currently no check that modules listed as valid in the data base
 117 actually are.  Thus any errors introduced there will remain there.
 118
 119 =head1 OPTIONS
 120
 121 =over
 122
 123 =item --regen
 124
 125 Regenerate the data base used by podcheck.t to include all the existing
 126 potential problems.  Future runs of the program will not then flag any of
 127 these.
 128
 129 =item --cpan
 130
 131 Normally, all pods in the cpan directory are skipped, except to make sure that
 132 any blead-upstream links to such pods are valid.
 133 This option will cause cpan upstream pods to be checked.
 134
 135 =item --show_all
 136
 137 Normally, if the number of potential problems of a given type found for a
 138 pod matches the expected value in the database, they will not be displayed.
 139 This option forces the database to be ignored during the run, so all potential
 140 problems are displayed and will fail their respective pod test.  Specifying
 141 any particular FILES to operate on automatically selects this option.
 142
 143 =item --counts
 144
 145 Instead of testing, this just dumps the counts of the occurrences of the
 146 various types of potential problems in the data base.
 147
 148 =back
 149
 150 =head1 FILES
 151
 152 The database is stored in F<t/porting/known_pod_issues.dat>
 153
 154 =head1 SEE ALSO
 155
 156 L<Pod::Checker>
 157
 158 =cut
 159
 160 #####################################################
 161 # HOW IT WORKS (in general)
 162 #
 163 # If not called with specific files to check, the directory structure is
 164 # examined for files that have pods in them.  Files that might not have to be
 165 # fully parsed (e.g. in cpan) are parsed enough at this time to find their
 166 # pod's NAME, and to get a checksum.
 167 #
 168 # Those kinds of files are sorted last, but otherwise the pods are parsed with
 169 # the package coded here, My::Pod::Checker, which is an extension to
 170 # Pod::Checker that adds some tests and suppresses others that aren't
 171 # appropriate.  The latter module has no provision for capturing diagnostics,
 172 # so a package, Tie_Array_to_FH, is used to force them to be placed into an
 173 # array instead of printed.
 174 #
 175 # Parsing the files builds up a list of links.  The files are gone through
 176 # again, doing cross-link checking and outputting all saved-up problems with
 177 # each pod.
 178 #
 179 # Sorting the files last that potentially don't need to be fully parsed allows
 180 # us to not parse them unless there is a link to an internal anchor in them
 181 # from something that we have already parsed.  Keeping checksums allows us to
 182 # not parse copies of other pods.
 183 #
 184 #####################################################
 185
 186 # 1 => Exclude low priority messages that aren't likely to be problems, and
 187 # has many false positives; higher numbers give more messages.
 188 my $Warnings_Level = 200;
 189
 190 # To see if two pods with the same NAME are actually copies of the same pod,
 191 # which is not an error, it uses a checksum to save work.
 192 my $digest_type = "SHA-1";
 193
 194 my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
 195 my $data_dir = File::Spec->catdir($original_dir, 'porting');
 196 my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
 197 my $copy_fh;
 198
 199 my $MAX_LINE_LENGTH = 80;   # 80 columns
 200 my $INDENT = 8;             # default nroff indent
 201
 202 # Our warning messages.  Better not have [('"] in them, as those are used as
 203 # delimiters for variable parts of the messages by poderror.
 204 my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
 205 my $broken_link = "Apparent broken link";
 206 my $broken_internal_link = "Apparent internal link is missing its forward slash";
 207 my $see_not_linked = "? Should you be using L<...> instead of";
 208 my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
 209 my $multiple_targets = "There is more than one target";
 210 my $duplicate_name = "Pod NAME already used";
 211 my $need_encoding = "Should have =encoding statement because have non-ASCII";
 212 my $encoding_first = "=encoding must be first command (if present)";
 213 my $no_name = "There is no NAME";
 214 my $missing_name_description = "The NAME should have a dash and short description after it";
 215
 216 # objects, tests, etc can't be pods, so don't look for them.
 217 my $non_pods = qr/\.(?:[achot]|zip|gz|bz2|jar|tar|tgz|PL|so)$/;
 218
 219
 220 # Pod::Checker messages to suppress
 221 my @suppressed_messages = (
 222     "(section) in",                         # Checker is wrong to flag this
 223     "multiple occurrence of link target",   # We catch independently the ones
 224                                             # that are real problems.
 225     "unescaped <>",
 226 );
 227
 228 sub suppressed {
 229     # Returns bool as to if input message is one that is to be suppressed
 230
 231     my $message = shift;
 232     return grep { $message =~ /^\Q$_/i } @suppressed_messages;
 233 }
 234
 235 {   # Closure to contain a simple subset of test.pl.  This is to get rid of the
 236     # unnecessary 'failed at' messages that would otherwise be output pointing
 237     # to a particular line in this file.
 238
 239     my $current_test = 0;
 240     my $planned;
 241
 242     sub plan {
 243         my %plan = @_;
 244         $planned = $plan{tests};
 245         print "1..$planned\n";
 246         return;
 247     }
 248
 249     sub ok {
 250         my $success = shift;
 251         my $message = shift;
 252
 253         chomp $message;
 254
 255         $current_test++;
 256         print "not " unless $success;
 257         print "ok $current_test - $message\n";
 258         return;
 259     }
 260
 261     sub skip {
 262         my $why = shift;
 263         my $n    = @_ ? shift : 1;
 264         for (1..$n) {
 265             $current_test++;
 266             print "ok $current_test # skip $why\n";
 267         }
 268         no warnings 'exiting';
 269         last SKIP;
 270     }
 271
 272     sub note {
 273         my $message = shift;
 274
 275         chomp $message;
 276
 277         print $message =~ s/^/# /mgr;
 278         print "\n";
 279         return;
 280     }
 281
 282     END {
 283         if ($planned && $planned != $current_test) {
 284             print STDERR
 285             "# Looks like you planned $planned tests but ran $current_test.\n";
 286         }
 287     }
 288 }
 289
 290 # This is to get this to work across multiple file systems, including those
 291 # that are not case sensitive.  The db is stored in lower case, and all file
 292 # name comparisons are done that way.
 293 sub canonicalize($) {
 294     return lc File::Spec->canonpath(shift);
 295 }
 296
 297
 298 # List of known potential problems by pod and type.
 299 my %known_problems;
 300
 301 # Pods given by the keys contain an interior node that is referred to from
 302 # outside it.
 303 my %has_referred_to_node;
 304
 305 my $show_counts = 0;
 306 my $regen = 0;
 307 my $show_all = 0;
 308
 309 # Assume that are to skip anything in /cpan
 310 my $do_upstream_cpan = 0;
 311
 312 while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
 313     my $arg = shift @ARGV;
 314
 315     $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
 316     if ($arg eq '-regen') {
 317         $regen = 1;
 318     }
 319     elsif ($arg eq '-cpan') {
 320         $do_upstream_cpan = 1;
 321     }
 322     elsif ($arg eq '-show_all') {
 323         $show_all = 1;
 324     }
 325     elsif ($arg eq '-counts') {
 326         $show_counts = 1;
 327     }
 328     else {
 329         die <<EOF;
 330 Unknown option '$arg'
 331
 332 Usage: $0 [ --regen | --cpan | --show_all ] [ FILE ... ]\n"
 333     --cpan     -> Include files in the cpan subdirectory.
 334     --regen    -> Regenerate the data file for $0
 335     --show_all -> Show all known potential problems
 336     --counts   -> Don't test, but give summary counts of the currently
 337                   existing database
 338 EOF
 339     }
 340 }
 341
 342 my @files = @ARGV;
 343
 344 if (($regen + $show_all + $show_counts + $do_upstream_cpan) > 1) {
 345     croak "--regen, --show_all, --cpan, and --counts are mutually exclusive";
 346 }
 347
 348 my $has_input_files = @files;
 349
 350 if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
 351     croak "--regen, --counts and --cpan can't be used since using specific files";
 352 }
 353
 354 our %problems;  # potential problems found in this run
 355
 356 package My::Pod::Checker {      # Extend Pod::Checker
 357     use parent 'Pod::Checker';
 358
 359     # Uses inside out hash to protect from typos
 360     # For new fields, remember to add to destructor DESTROY()
 361     my %indents;            # Stack of indents from =over's in effect for
 362                             # current line
 363     my %current_indent;     # Current line's indent
 364     my %filename;           # The pod is store in this file
 365     my %skip;               # is SKIP set for this pod
 366     my %in_NAME;            # true if within NAME section
 367     my %in_begin;           # true if within =begin section
 368     my %linkable_item;      # Bool: if the latest =item is linkable.  It isn't
 369                             # for bullet and number lists
 370     my %linkable_nodes;     # Pod::Checker adds all =items to its node list,
 371                             # but not all =items are linkable to
 372     my %seen_encoding_cmd;  # true if have =encoding earlier
 373     my %command_count;      # Number of commands seen
 374     my %seen_pod_cmd;       # true if have =pod earlier
 375     my %warned_encoding;    # true if already have warned about =encoding
 376                             # problems
 377
 378     sub DESTROY {
 379         my $addr = Scalar::Util::refaddr $_[0];
 380         delete $command_count{$addr};
 381         delete $current_indent{$addr};
 382         delete $filename{$addr};
 383         delete $in_begin{$addr};
 384         delete $indents{$addr};
 385         delete $in_NAME{$addr};
 386         delete $linkable_item{$addr};
 387         delete $linkable_nodes{$addr};
 388         delete $seen_encoding_cmd{$addr};
 389         delete $seen_pod_cmd{$addr};
 390         delete $skip{$addr};
 391         delete $warned_encoding{$addr};
 392         return;
 393     }
 394
 395     sub new {
 396         my $class = shift;
 397         my $filename = shift;
 398
 399         my $self = $class->SUPER::new(-quiet => 1,
 400                                      -warnings => $Warnings_Level);
 401         my $addr = Scalar::Util::refaddr $self;
 402         $command_count{$addr} = 0;
 403         $current_indent{$addr} = 0;
 404         $filename{$addr} = $filename;
 405         $in_begin{$addr} = 0;
 406         $in_NAME{$addr} = 0;
 407         $linkable_item{$addr} = 0;
 408         $seen_encoding_cmd{$addr} = 0;
 409         $seen_pod_cmd{$addr} = 0;
 410         $warned_encoding{$addr} = 0;
 411         return $self;
 412     }
 413
 414     # re's for messages that Pod::Checker outputs
 415     my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
 416     my $optional_location = qr/ (?: $location )? /xi;
 417     my $line_reference = qr/ [('"]? $optional_location \b line \s+
 418                              (?: \d+ | EOF | \Q???\E | - )
 419                              [)'"]? /xi;
 420
 421     sub poderror {  # Called to register a potential problem
 422
 423         # This adds an extra field to the parent hash, 'parameter'.  It is
 424         # used to extract the variable parts of a message leaving just the
 425         # constant skeleton.  This in turn allows the message to be
 426         # categorized better, so that it shows up as a single type in our
 427         # database, with the specifics of each occurrence not being stored with
 428         # it.
 429
 430         my $self = shift;
 431         my $opts = shift;
 432
 433         my $addr = Scalar::Util::refaddr $self;
 434         return if $skip{$addr};
 435
 436         # Input can be a string or hash.  If a string, parse it to separate
 437         # out the line number and convert to a hash for easier further
 438         # processing
 439         my $message;
 440         if (ref $opts ne 'HASH') {
 441             $message = join "", $opts, @_;
 442             my $line_number;
 443             if ($message =~ s/\s*($line_reference)//) {
 444                 ($line_number = $1) =~ s/\s*$optional_location//;
 445             }
 446             else {
 447                 $line_number = '???';
 448             }
 449             $opts = { -msg => $message, -line => $line_number };
 450         } else {
 451             $message = $opts->{'-msg'};
 452
 453         }
 454
 455         $message =~ s/^\d+\s+//;
 456         return if main::suppressed($message);
 457
 458         $self->SUPER::poderror($opts, @_);
 459
 460         $opts->{parameter} = "" unless $opts->{parameter};
 461
 462         # The variable parts of the message tend to be enclosed in '...',
 463         # "....", or (...).  Extract them and put them in an extra field,
 464         # 'parameter'.  This is trickier because the matching delimiter to a
 465         # '(' is its mirror, and not itself.  Text::Balanced could be used
 466         # instead.
 467         while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
 468             my $delimiter = $1;
 469             my $start = $-[0];
 470             $delimiter = ')' if $delimiter eq '(';
 471
 472             # If there is no ending delimiter, don't consider it to be a
 473             # variable part.  Most likely it is a contraction like "Don't"
 474             last unless $message =~ m/\G .+? \Q$delimiter/xg;
 475
 476             my $length = $+[0] - $start;
 477
 478             # Get the part up through the closing delimiter
 479             my $special = substr($message, $start, $length);
 480             $special =~ s/^\s+//;   # No leading whitespace
 481
 482             # And add that variable part to the parameter, while removing it
 483             # from the message.  This isn't a foolproof way of finding the
 484             # variable part.  For example '(s)' can occur in e.g.,
 485             # 'paragraph(s)'
 486             if ($special ne '(s)') {
 487                 substr($message, $start, $length) = "";
 488                 pos $message = $start;
 489                 $opts->{-msg} = $message;
 490                 $opts->{parameter} .= " " if $opts->{parameter};
 491                 $opts->{parameter} .= $special;
 492             }
 493         }
 494
 495         # Extract any additional line number given.  This is often the
 496         # beginning location of something whereas the main line number gives
 497         # the ending one.
 498         if ($message =~ /( $line_reference )/xi) {
 499             my $line_ref = $1;
 500             while ($message =~ s/\s*\Q$line_ref//) {
 501                 $opts->{-msg} = $message;
 502                 $opts->{parameter} .= " " if $opts->{parameter};
 503                 $opts->{parameter} .= $line_ref;
 504             }
 505         }
 506
 507         Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
 508         push @{$problems{$filename{$addr}}{$message}}, $opts;
 509         #push @{$problems{$self->get_filename}{$message}}, $opts;
 510     }
 511
 512     sub check_encoding {    # Does it need an =encoding statement?
 513         my ($self, $paragraph, $line_num, $pod_para) = @_;
 514
 515         # Do nothing if there is an =encoding in the file, or if the line
 516         # doesn't require an =encoding, or have already warned.
 517         my $addr = Scalar::Util::refaddr $self;
 518         return if $seen_encoding_cmd{$addr}
 519                     || $warned_encoding{$addr}
 520                     || $paragraph !~ /\P{ASCII}/;
 521
 522         $warned_encoding{$addr} = 1;
 523         my ($file, $line) = $pod_para->file_line;
 524         $self->poderror({ -line => $line, -file => $file,
 525                           -msg => $need_encoding
 526                         });
 527         return;
 528     }
 529
 530     sub verbatim {
 531         my ($self, $paragraph, $line_num, $pod_para) = @_;
 532         $self->check_encoding($paragraph, $line_num, $pod_para);
 533
 534         $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
 535
 536         # Pick up the name, since the parent class doesn't in verbatim
 537         # NAMEs; so treat as non-verbatim.  The parent class only allows one
 538         # paragraph in a NAME section, so if there is an extra blank line, it
 539         # will trigger a message, but such a blank line is harmless, so skip
 540         # in that case.
 541         if ($in_NAME{Scalar::Util::refaddr $self} && $paragraph =~ /\S/) {
 542             $self->textblock($paragraph, $line_num, $pod_para);
 543         }
 544
 545         my @lines = split /^/, $paragraph;
 546         for my $i (0 .. @lines - 1) {
 547             $lines[$i] =~ s/\s+$//;
 548             my $indent = $self->get_current_indent;
 549             my $exceeds = length(Text::Tabs::expand($lines[$i]))
 550                           + $indent - $MAX_LINE_LENGTH;
 551             next unless $exceeds > 0;
 552             my ($file, $line) = $pod_para->file_line;
 553             $self->poderror({ -line => $line + $i, -file => $file,
 554                 -msg => $line_length,
 555                 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
 556             });
 557         }
 558     }
 559
 560     sub textblock {
 561         my ($self, $paragraph, $line_num, $pod_para) = @_;
 562         $self->check_encoding($paragraph, $line_num, $pod_para);
 563
 564         $self->SUPER::textblock($paragraph, $line_num, $pod_para);
 565
 566         my ($file, $line) = $pod_para->file_line;
 567         my $addr = Scalar::Util::refaddr $self;
 568         if ($in_NAME{$addr}) {
 569             if (! $self->name) {
 570                 my $text = $self->interpolate($paragraph, $line_num);
 571                 if ($text =~ /^\s*(\S+?)\s*$/) {
 572                     $self->name($1);
 573                     $self->poderror({ -line => $line, -file => $file,
 574                         -msg => $missing_name_description,
 575                         parameter => $1});
 576                 }
 577             }
 578         }
 579         $paragraph = join " ", split /^/, $paragraph;
 580
 581         # Matches something that looks like a file name, but is enclosed in
 582         # C<...>
 583         my $C_path_re = qr{ \b ( C<
 584                                 # exclude regexes and 'OS/2'
 585                                 (?! (?: (?: s | qr | m) / ) | OS/2 > )
 586                                 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
 587                           }x;
 588
 589         # If looks like a reference to other documentation by containing the
 590         # word 'See' and then a likely pod directive, warn.
 591
 592         while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
 593                                 ( [^<]*? )  # The not-< excludes nested C<L<...
 594                                 > ) }ixg) {
 595             my $construct = $1;
 596             my $type = $2;
 597             my $interior = $3;
 598             if ($interior !~ /$non_pods/
 599                 && $construct !~ /$C_path_re/g) {
 600                 $self->poderror({ -line => $line, -file => $file,
 601                     -msg => $see_not_linked,
 602                     parameter => $construct
 603                 });
 604             }
 605         }
 606         while ($paragraph =~ m/$C_path_re/g) {
 607             my $construct = $1;
 608             $self->poderror({ -line => $line, -file => $file,
 609                 -msg => $C_with_slash,
 610                 parameter => $construct
 611             });
 612         }
 613         return;
 614     }
 615
 616     sub command {
 617         my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
 618         my $addr = Scalar::Util::refaddr $self;
 619         if ($cmd eq "pod") {
 620             $seen_pod_cmd{$addr}++;
 621         }
 622         elsif ($cmd eq "encoding") {
 623             my ($file, $line) = $pod_para->file_line;
 624             $seen_encoding_cmd{$addr} = 1;
 625             if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
 626                 $self->poderror({ -line => $line, -file => $file,
 627                                   -msg => $encoding_first
 628                                 });
 629             }
 630         }
 631         $self->check_encoding($paragraph, $line_num, $pod_para);
 632
 633         # Pod::Check treats all =items as linkable, but the bullet and
 634         # numbered lists really aren't.  So keep our own list.  This has to be
 635         # processed before SUPER is called so that the list is started before
 636         # the rest of it gets parsed.
 637         if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
 638             $linkable_item{$addr} = ($paragraph !~ / ^ \s*
 639                                                    (?: [*]
 640                                                    | \d+ \.? (?: \$ | \s+ )
 641                                                    )/x)
 642                                   ? 1
 643                                   : 0;
 644
 645         }
 646         $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
 647
 648         $command_count{$addr}++;
 649
 650         $in_NAME{$addr} = 0;    # Will change to 1 below if necessary
 651         $in_begin{$addr} = 0;   # ibid
 652         if ($cmd eq 'over') {
 653             my $text = $self->interpolate($paragraph, $line_num);
 654             my $indent = 4; # default
 655             $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
 656             push @{$indents{$addr}}, $indent;
 657             $current_indent{$addr} += $indent;
 658         }
 659         elsif ($cmd eq 'back') {
 660             if (@{$indents{$addr}}) {
 661                 $current_indent{$addr} -= pop @{$indents{$addr}};
 662             }
 663             else {
 664                  # =back without corresponding =over, but should have
 665                  # warned already
 666                 $current_indent{$addr} = 0;
 667             }
 668         }
 669         elsif ($cmd =~ /^head/) {
 670             if (! $in_begin{$addr}) {
 671
 672                 # If a particular formatter, then this command doesn't really
 673                 # apply
 674                 $current_indent{$addr} = 0;
 675                 undef @{$indents{$addr}};
 676             }
 677
 678             my $text = $self->interpolate($paragraph, $line_num);
 679             $in_NAME{$addr} = 1 if $cmd eq 'head1'
 680                                    && $text && $text =~ /^NAME\b/;
 681         }
 682         elsif ($cmd eq 'begin') {
 683             $in_begin{$addr} = 1;
 684         }
 685
 686         return;
 687     }
 688
 689     sub hyperlink {
 690         my $self = shift;
 691
 692         # If the hyperlink is to an interior node of another page, save it
 693         # so that we can see if we need to parse normally skipped files.
 694         $has_referred_to_node{$_[0][1]{'-page'}} = 1
 695                             if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
 696         return $self->SUPER::hyperlink($_[0]);
 697     }
 698
 699     sub node {
 700         my $self = shift;
 701         my $text = $_[0];
 702         if($text) {
 703             $text =~ s/\s+$//s; # strip trailing whitespace
 704             $text =~ s/\s+/ /gs; # collapse whitespace
 705             my $addr = Scalar::Util::refaddr $self;
 706             push(@{$linkable_nodes{$addr}}, $text) if
 707                                     ! $current_indent{$addr}
 708                                     || $linkable_item{$addr};
 709         }
 710         return $self->SUPER::node($_[0]);
 711     }
 712
 713     sub get_current_indent {
 714         return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
 715     }
 716
 717     sub get_filename {
 718         return $filename{Scalar::Util::refaddr $_[0]};
 719     }
 720
 721     sub linkable_nodes {
 722         my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
 723         return undef unless $linkables;
 724         return @$linkables;
 725     }
 726
 727     sub get_skip {
 728         return $skip{Scalar::Util::refaddr $_[0]} // 0;
 729     }
 730
 731     sub set_skip {
 732         my $self = shift;
 733         $skip{Scalar::Util::refaddr $self} = shift;
 734
 735         # If skipping, no need to keep the problems for it
 736         delete $problems{$self->get_filename};
 737         return;
 738     }
 739 }
 740
 741 package Tie_Array_to_FH {  # So printing actually goes to an array
 742
 743     my %array;
 744
 745     sub TIEHANDLE {
 746         my $class = shift;
 747         my $array_ref = shift;
 748
 749         my $self = bless \do{ my $anonymous_scalar }, $class;
 750         $array{Scalar::Util::refaddr $self} = $array_ref;
 751
 752         return $self;
 753     }
 754
 755     sub PRINT {
 756         my $self = shift;
 757         push @{$array{Scalar::Util::refaddr $self}}, @_;
 758         return 1;
 759     }
 760 }
 761
 762
 763 my %filename_to_checker; # Map a filename to it's pod checker object
 764 my %id_to_checker;      # Map a checksum to it's pod checker object
 765 my %nodes;              # key is filename, values are nodes in that file.
 766 my %nodes_first_word;   # same, but value is first word of each node
 767 my %valid_modules;      # List of modules known to exist outside us.
 768 my %digests;            # checksums of files, whose names are the keys
 769 my %filename_to_pod;    # Map a filename to its pod NAME
 770 my %files_with_unknown_issues;
 771 my %files_with_fixes;
 772
 773 my $data_fh;
 774 open($data_fh, $known_issues) || die "Can't open $known_issues";
 775
 776 my %counts; # For --counts param, count of each issue type
 777 my %suppressed_files;   # Files with at least one issue type to suppress
 778
 779 while (<$data_fh>) {    # Read the data base
 780     chomp;
 781     next if /^\s*(?:#|$)/;  # Skip comment and empty lines
 782     if (/\t/) {
 783         next if $show_all;
 784
 785         # Keep track of counts of each issue type for each file
 786         my ($filename, $message, $count) = split /\t/;
 787         $known_problems{$filename}{$message} = $count;
 788
 789         if ($show_counts) {
 790             if ($count < 0) {   # -1 means to suppress this issue type
 791                 $suppressed_files{$filename} = $filename;
 792             }
 793             else {
 794                 $counts{$message} += $count;
 795             }
 796         }
 797     }
 798     else {  # Lines without a tab are modules known to be valid
 799         $valid_modules{$_} = 1
 800     }
 801 }
 802 close $data_fh;
 803
 804 if ($show_counts) {
 805     my $total = 0;
 806     foreach my $message (sort keys %counts) {
 807         $total += $counts{$message};
 808         note(Text::Tabs::expand("$counts{$message}\t$message"));
 809     }
 810     note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
 811     if (%suppressed_files) {
 812         note("\nFiles that have all messages of at least one type suppressed:");
 813         note(join ",", keys %suppressed_files);
 814     }
 815     exit 0;
 816 }
 817
 818
 819 my %excluded_files = (
 820                         "lib/unicore/mktables" => 1,
 821                         "Porting/perldelta_template.pod" => 1,
 822                         "autodoc.pl" => 1,
 823                         "configpm" => 1,
 824                         "miniperl" => 1,
 825                         "perl" => 1,
 826
 827                         # It would be nice if we didn't have to skip this,
 828                         # but the errors in it are too variable.
 829                         "pod/perltoc.pod" => 1,
 830                     );
 831
 832 # Convert to more generic form.
 833 foreach my $file (keys %excluded_files) {
 834     $excluded_files{canonicalize($excluded_files{$file})}
 835                                                     = $excluded_files{$file};
 836 }
 837
 838 # re to match files that are to be parsed only if there is an internal link
 839 # to them.  It does not include cpan, as whether those are parsed depends
 840 # on a switch.  Currently, only the stable perldelta.pod's are included.
 841 # These all have characters between 'perl' and 'delta'.  (Actually the
 842 # currently developed one matches as well, but is a duplicate of
 843 # perldelta.pod, so can be skipped, so fine for it to match this.
 844 my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
 845
 846 { # Closure
 847     my $first_time = 1;
 848
 849     sub output_thanks ($$$$) {  # Called when an issue has been fixed
 850         my $filename = shift;
 851         my $original_count = shift;
 852         my $current_count = shift;
 853         my $message = shift;
 854
 855         $files_with_fixes{$filename} = 1;
 856         my $return;
 857         my $fixed_count = $original_count - $current_count;
 858         my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
 859         my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
 860         my $diff;
 861         if ($message) {
 862             $diff = <<EOF;
 863 There were $original_count occurrences (now $current_count) in this pod of type
 864 "$message",
 865 EOF
 866         } else {
 867             $diff = <<EOF;
 868 There are no longer any problems found in this pod!
 869 EOF
 870         }
 871
 872         if ($first_time) {
 873             $first_time = 0;
 874             $return = <<EOF;
 875 Thanks for fixing $a_problem!
 876 $diff
 877 Now you must teach $0 that this was fixed.
 878 EOF
 879         }
 880         else {
 881             $return = <<EOF
 882 Thanks for fixing $another_problem.
 883 $diff
 884 EOF
 885         }
 886
 887         return $return;
 888     }
 889 }
 890
 891 sub my_safer_print {    # print, with error checking for outputting to db
 892     my ($fh, @lines) = @_;
 893
 894     if (! print $fh @lines) {
 895         my $save_error = $!;
 896         close($fh);
 897         die "Write failure: $save_error";
 898     }
 899 }
 900
 901 sub extract_pod {   # Extracts just the pod from a file
 902     my $filename = shift;
 903
 904     my @pod;
 905
 906     # Arrange for the output of Pod::Parser to be collected in an array we can
 907     # look at instead of being printed
 908     tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
 909     open my $in_fh, '<', $filename
 910         or die "Can't open '$filename': $!\n";
 911
 912     my $parser = Pod::Parser->new();
 913     $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
 914     close $in_fh;
 915
 916     return join "", @pod
 917 }
 918
 919 my $digest = Digest->new($digest_type);
 920
 921 sub is_pod_file {
 922     if (-d $_) {
 923         # Don't look at files in directories that are for tests, nor those
 924         # beginning with a dot
 925         if ($_ eq 't' || $_ =~ /^\../) {
 926             $File::Find::prune = 1;
 927         }
 928         return;
 929     }
 930
 931     return if $_ =~ /^\./;           # No hidden Unix files
 932     return if $_ =~ $non_pods;
 933
 934     my $filename = $File::Find::name;
 935
 936     # Assumes that the path separator is exactly one character.
 937     $filename =~ s/^\..//;
 938
 939     return if $excluded_files{canonicalize($filename)};
 940
 941     open my $candidate, '<', $_
 942         or die "Can't open '$File::Find::name': $!\n";
 943     my @contents = <$candidate>;
 944     close $candidate;
 945
 946     # If the file is a .pm or .pod, having any initial '=' on a line is
 947     # grounds for testing it.  Otherwise, require a head1 NAME line to view it
 948     # as a potential pod
 949     my $i;
 950     my $found = "";
 951     for ($i = 0; $i < @contents; $i++) {
 952         next unless $contents[$i] =~ /^=/;
 953         if ($filename =~ /\.(?:pm|pod)/) {
 954             $found = 'found_some_pod_line';
 955             last;
 956         }
 957         elsif ($contents[$i] =~ /^=head1 +NAME/) {
 958             $found = 'found_NAME';
 959             last;
 960         }
 961     }
 962     if ($found) {
 963         # Here, we know that the file is a pod.  Add it to the list of files
 964         # to check and create a checker object for it.
 965
 966         push @files, $filename;
 967         my $checker = My::Pod::Checker->new($filename);
 968         $filename_to_checker{$filename} = $checker;
 969
 970         # In order to detect duplicate pods and only analyze them once, we
 971         # compute checksums for the file, so don't have to do an exact
 972         # compare.  Note that if the pod is just part of the file, the
 973         # checksums can differ for the same pod.  That special case is handled
 974         # later, since if the checksums of the whole file are the same, that
 975         # case won't even come up.  We don't need the checksums for files that
 976         # we parse only if there is a link to its interior, but we do need its
 977         # NAME, which is also retrieved in the code below.
 978         if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
 979                             | $only_for_interior_links_re
 980                         /x) {
 981             $digest->add(@contents);
 982             $digests{$filename} = $digest->digest;
 983
 984             # lib files aren't analyzed if they are duplicates of files copied
 985             # there from some other directory.  But to determine this, we need
 986             # to know their NAMEs.  We might as well find the NAME now while
 987             # the file is open.  Similarly, cpan files aren't analyzed unless
 988             # we're analyzing all of them, or this particular file is linked
 989             # to by a file we are analyzing, and thus we will want to verify
 990             # that the target exists in it.  We need to know at least the NAME
 991             # to see if it's worth analyzing, or so we can determine if a lib
 992             # file is a copy of a cpan one.
 993             if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
 994                                 | $only_for_interior_links_re
 995                                 }x) {
 996                 if ($found eq 'found_some_pod_line') {
 997                     for (;  $i < @contents; $i++) {
 998                         next if $contents[$i] !~ /^=head1/;
 999                         $found = 'found_NAME'
1000                                         if $contents[$i] =~ /^=head1 +NAME/;
1001                         last;
1002                     }
1003                 }
1004                 if ($found eq 'found_NAME') {
1005                     $i++;   # The NAME starts on a later line
1006
1007                     # Skip empty lines
1008                     while ($contents[$i] !~ /\S/) { $i++ }
1009
1010                     # The NAME is the first non-spaces on the line up to a
1011                     # comma, dash or end of line.  Otherwise, it's invalid and
1012                     # this pod doesn't have a legal name that we're smart
1013                     # enough to find currently.  But the  parser will later
1014                     # find it if it thinks there is a legal name, and set the
1015                     # name
1016                     if ($contents[$i] =~ /^ \s* ( \S+?) \s* (?: [,-] | $ )/x) {
1017                         my $name = $1;
1018                         $checker->name($name);
1019                         $id_to_checker{$name} = $checker
1020                                                 if $filename =~ m{^cpan/};
1021                     }
1022                 }
1023                 elsif ($filename =~ m{^cpan/}) {
1024                     $id_to_checker{$digests{$filename}} = $checker;
1025                 }
1026             }
1027         }
1028     }
1029 } # End of is_pod_file()
1030
1031 # Start of real code that isn't processing the command line.
1032 # Here, @files contains list of files on the command line.  If have any of
1033 # these, unconditionally test them, and show all the errors, even the known
1034 # ones, and, since not testing other pods, don't do cross-pod link tests.
1035 # (Could add extra code to do cross-pod tests for the ones in the list.)
1036 if ($has_input_files) {
1037     undef %known_problems;
1038     $do_upstream_cpan = 1;  # In case one of the inputs is from cpan
1039
1040 }
1041 else { # No input files -- go find all the possibilities.
1042     if ($regen) {
1043         $copy_fh = open_new($known_issues);
1044         note("Regenerating $known_issues, please be patient...");
1045         print $copy_fh <<END;
1046 # This file is the data file for $0.
1047 # There are three types of lines.
1048 # Comment lines are white-space only or begin with a '#', like this one.  Any
1049 #   changes you make to the comment lines will be lost when the file is
1050 #   regen'd.
1051 # Lines without tab characters are simply NAMES of pods that the program knows
1052 #   will have links to them and the program does not check if those links are
1053 #   valid.
1054 # All other lines should have three fields, each separated by a tab.  The
1055 #   first field is the name of a pod; the second field is an error message
1056 #   generated by this program; and the third field is a count of how many
1057 #   known instances of that message there are in the pod.  -1 means that the
1058 #   program can expect any number of this type of message.
1059 END
1060     }
1061
1062     # Move to the directory above us, but have to adjust @INC to account for
1063     # that.
1064     s{^\.\./lib$}{lib} for @INC;
1065     chdir File::Spec->updir;
1066
1067     # And look in this directory and all its subdirectories
1068     find( \&is_pod_file, '.');
1069
1070     # Add ourselves to the test
1071     push @files, "t/porting/podcheck.t";
1072 }
1073
1074 # Now we know how many tests there will be.
1075 plan (tests => scalar @files) if ! $regen;
1076
1077
1078  # Sort file names so we get consistent results, and to put cpan last,
1079  # preceeded by the ones that we don't generally parse.  This is because both
1080  # these classes are generally parsed only if there is a link to the interior
1081  # of them, and we have to parse all others first to guarantee that they don't
1082  # have such a link. 'lib' files come just before these, as some of these are
1083  # duplicates of others.  We already have figured this out when gathering the
1084  # data as a special case for all such files, but this, while unnecessary,
1085  # puts the derived file last in the output.  'readme' files come before those,
1086  # as those also could be duplicates of others, which are considered the
1087  # primary ones.  These currently aren't figured out when gathering data, so
1088  # are done here.
1089  @files = sort { if ($a =~ /^cpan/) {
1090                     return 1 if $b !~ /^cpan/;
1091                     return $a cmp $b;
1092                 }
1093                 elsif ($b =~ /^cpan/) {
1094                     return -1;
1095                 }
1096                 elsif ($a =~ /$only_for_interior_links_re/) {
1097                     return 1 if $b !~ /$only_for_interior_links_re/;
1098                     return $a cmp $b;
1099                 }
1100                 elsif ($b =~ /$only_for_interior_links_re/) {
1101                     return -1;
1102                 }
1103                 elsif ($a =~ /^lib/) {
1104                     return 1 if $b !~ /^lib/;
1105                     return $a cmp $b;
1106                 }
1107                 elsif ($b =~ /^lib/) {
1108                     return -1;
1109                 } elsif ($a =~ /\breadme\b/i) {
1110                     return 1 if $b !~ /\breadme\b/i;
1111                     return $a cmp $b;
1112                 }
1113                 elsif ($b =~ /\breadme\b/i) {
1114                     return -1;
1115                 }
1116                 else {
1117                     return lc $a cmp lc $b;
1118                 }
1119             }
1120             @files;
1121
1122 # Now go through all the files and parse them
1123 foreach my $filename (@files) {
1124     my $parsed = 0;
1125     note("parsing $filename") if DEBUG;
1126
1127     # We may have already figured out some things in the process of generating
1128     # the file list.  If so, have a $checker object already.  But if not,
1129     # generate one now.
1130     my $checker = $filename_to_checker{$filename};
1131     if (! $checker) {
1132         $checker = My::Pod::Checker->new($filename);
1133         $filename_to_checker{$filename} = $checker;
1134     }
1135
1136     # We have set the name in the checker object if there is a possibility
1137     # that no further parsing is necessary, but otherwise do the parsing now.
1138     if (! $checker->name) {
1139         $parsed = 1;
1140         $checker->parse_from_file($filename, undef);
1141     }
1142
1143     if ($checker->num_errors() < 0) {   # Returns negative if not a pod
1144         $checker->set_skip("$filename is not a pod");
1145     }
1146     else {
1147
1148         # Here, is a pod.  See if it is one that has already been tested,
1149         # or should be tested under another directory.  Use either its NAME
1150         # if it has one, or a checksum if not.
1151         my $name = $checker->name;
1152         my $id;
1153
1154         if ($name) {
1155             $id = $name;
1156         }
1157         else {
1158             my $digest = Digest->new($digest_type);
1159             $digest->add(extract_pod($filename));
1160             $id = $digest->digest;
1161         }
1162
1163         # If there is a match for this pod with something that we've already
1164         # processed, don't process it, and output why.
1165         my $prior_checker;
1166         if (defined ($prior_checker = $id_to_checker{$id})
1167             && $prior_checker != $checker)  # Could have defined the checker
1168                                             # earlier without pursuing it
1169         {
1170
1171             # If the pods are identical, then it's just a copy, and isn't an
1172             # error.  First use the checksums we have already computed to see
1173             # if the entire files are identical, which means that the pods are
1174             # identical too.
1175             my $prior_filename = $prior_checker->get_filename;
1176             my $same = (! $name
1177                         || ($digests{$prior_filename}
1178                             && $digests{$filename}
1179                             && $digests{$prior_filename} eq $digests{$filename}));
1180
1181             # If they differ, it could be that the files differ for some
1182             # reason, but the pods they contain are identical.  Extract the
1183             # pods and do the comparisons on just those.
1184             if (! $same && $name) {
1185                 $same = extract_pod($prior_filename) eq extract_pod($filename);
1186             }
1187
1188             if ($same) {
1189                 $checker->set_skip("The pod of $filename is a duplicate of "
1190                                     . "the pod for $prior_filename");
1191             } elsif ($prior_filename =~ /\breadme\b/i) {
1192                 $checker->set_skip("$prior_filename is a README apparently for $filename");
1193             } elsif ($filename =~ /\breadme\b/i) {
1194                 $checker->set_skip("$filename is a README apparently for $prior_filename");
1195             } else { # Here have two pods with identical names that differ
1196                 $prior_checker->poderror(
1197                         { -msg => $duplicate_name,
1198                             -line => "???",
1199                             parameter => "'$filename' also has NAME '$name'"
1200                         });
1201                 $checker->poderror(
1202                     { -msg => $duplicate_name,
1203                         -line => "???",
1204                         parameter => "'$prior_filename' also has NAME '$name'"
1205                     });
1206
1207                 # Changing the names helps later.
1208                 $prior_checker->name("$name version arbitrarily numbered 1");
1209                 $checker->name("$name version arbitrarily numbered 2");
1210             }
1211
1212             # In any event, don't process this pod that has the same name as
1213             # another.
1214             next;
1215         }
1216
1217         # A unique pod.
1218         $id_to_checker{$id} = $checker;
1219
1220         my $parsed_for_links = ", but parsed for its interior links";
1221         if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1222              || $filename =~ $only_for_interior_links_re)
1223         {
1224             if ($filename =~ /^cpan/) {
1225                 $checker->set_skip("CPAN is upstream for $filename");
1226             }
1227             elsif ($filename =~ /perl\d+delta/) {
1228                 $checker->set_skip("$filename is a stable perldelta");
1229             }
1230             else {
1231                 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1232             }
1233
1234             if ($name && $has_referred_to_node{$name}) {
1235                 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1236             }
1237         }
1238
1239         # Need a name in order to process it, because not meaningful
1240         # otherwise, and also can't test links to this without a name.
1241         if (!defined $name) {
1242             $checker->poderror( { -msg => $no_name,
1243                                   -line => '???'
1244                                 });
1245             next;
1246         }
1247
1248         # For skipped files, just get its NAME
1249         my $skip;
1250         if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1251         {
1252             $checker->node($name) if $name;
1253         }
1254         else {
1255             $checker->parse_from_file($filename, undef) if ! $parsed;
1256         }
1257
1258         # Go through everything in the file that could be an anchor that
1259         # could be a link target.  Count how many there are of the same name.
1260         foreach my $node ($checker->linkable_nodes) {
1261             next if ! $node;        # Can be empty is like '=item *'
1262             if (exists $nodes{$name}{$node}) {
1263                 $nodes{$name}{$node}++;
1264             }
1265             else {
1266                 $nodes{$name}{$node} = 1;
1267             }
1268
1269             # Experiments have shown that cpan search can figure out the
1270             # target of a link even if the exact wording is incorrect, as long
1271             # as the first word is.  This happens frequently in perlfunc.pod,
1272             # where the link will be just to the function, but the target
1273             # entry also includes parameters to the function.
1274             my $first_word = $node;
1275             if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1276                 $nodes_first_word{$name}{$first_word} = $node;
1277             }
1278         }
1279         $filename_to_pod{$filename} = $name;
1280     }
1281 }
1282
1283 # Here, all files have been parsed, and all links and link targets are stored.
1284 # Now go through the files again and see which don't have matches.
1285 if (! $has_input_files) {
1286     foreach my $filename (@files) {
1287         next if $filename_to_checker{$filename}->get_skip;
1288         my $checker = $filename_to_checker{$filename};
1289         foreach my $link ($checker->hyperlink) {
1290             my $linked_to_page = $link->[1]->page;
1291             next unless $linked_to_page;   # intra-file checks are handled by std
1292                                            # Pod::Checker
1293
1294             # Initialize the potential message.
1295             my %problem = ( -msg => $broken_link,
1296                             -line => $link->[0],
1297                             parameter => "to \"$linked_to_page\"",
1298                         );
1299
1300             # See if we have found the linked-to_file in our parse
1301             if (exists $nodes{$linked_to_page}) {
1302                 my $node = $link->[1]->node;
1303
1304                 # If link is only to the page-level, already have it
1305                 next if ! $node;
1306
1307                 # Transform pod language to what we are expecting
1308                 $node =~ s,E<sol>,/,g;
1309                 $node =~ s/E<verbar>/|/g;
1310
1311                 # If link is to a node that exists in the file, is ok
1312                 if ($nodes{$linked_to_page}{$node}) {
1313
1314                     # But if the page has multiple targets with the same name,
1315                     # it's ambiguous which one this should be to.
1316                     if ($nodes{$linked_to_page}{$node} > 1) {
1317                         $problem{-msg} = $multiple_targets;
1318                         $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1319                         $checker->poderror(\%problem);
1320                     }
1321                 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1322
1323                     # Here the link target was not found, either exactly or to
1324                     # the first word.  Is an error.
1325                     $problem{parameter} =~ s,"$,/$node",;
1326                     $checker->poderror(\%problem);
1327                 }
1328
1329             } # Linked-to-file not in parse; maybe is in exception list
1330             elsif (! exists $valid_modules{$link->[1]->page}) {
1331
1332                 # Here, is a link to a target that we can't find.  Check if
1333                 # there is an internal link on the page with the target name.
1334                 # If so, it could be that they just forgot the initial '/'
1335                 if ($filename_to_pod{$filename}
1336                     && $nodes{$filename_to_pod{$filename}}{$linked_to_page})
1337                 {
1338                     $problem{-msg} =  $broken_internal_link;
1339                 }
1340                 $checker->poderror(\%problem);
1341             }
1342         }
1343     }
1344 }
1345
1346 # If regenerating the data file, start with the modules for which we don't
1347 # check targets
1348 if ($regen) {
1349     foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1350         my_safer_print($copy_fh, $_, "\n");
1351     }
1352 }
1353
1354 # Now ready to output the messages.
1355 foreach my $filename (@files) {
1356     my $test_name = "POD of $filename";
1357     my $canonical = canonicalize($filename);
1358     SKIP: {
1359         my $skip = $filename_to_checker{$filename}->get_skip // "";
1360
1361         if ($regen) {
1362             foreach my $message ( sort keys %{$problems{$filename}}) {
1363                 my $count;
1364
1365                 # Preserve a negative setting.
1366                 if ($known_problems{$canonical}{$message}
1367                     && $known_problems{$canonical}{$message} < 0)
1368                 {
1369                     $count = $known_problems{$canonical}{$message};
1370                 }
1371                 else {
1372                     $count = @{$problems{$filename}{$message}};
1373                 }
1374                 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
1375             }
1376             next;
1377         }
1378
1379         skip($skip, 1) if $skip;
1380         my @diagnostics;
1381         my $indent = '  ';
1382
1383         my $total_known = 0;
1384         foreach my $message ( sort keys %{$problems{$filename}}) {
1385             $known_problems{$canonical}{$message} = 0
1386                                     if ! $known_problems{$canonical}{$message};
1387             my $diagnostic = "";
1388             my $problem_count = scalar @{$problems{$filename}{$message}};
1389             $total_known += $problem_count;
1390             next if $known_problems{$canonical}{$message} < 0;
1391             if ($problem_count > $known_problems{$canonical}{$message}) {
1392
1393                 # Here we are about to output all the messages for this type,
1394                 # subtract back this number we previously added in.
1395                 $total_known -= $problem_count;
1396
1397                 $diagnostic .= $indent . $message;
1398                 if ($problem_count > 2) {
1399                     $diagnostic .= "  ($problem_count occurrences)";
1400                 }
1401                 foreach my $problem (@{$problems{$filename}{$message}}) {
1402                     $diagnostic .= " " if $problem_count == 1;
1403                     $diagnostic .= "\n$indent$indent";
1404                     $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1405                     $diagnostic .= " near line $problem->{-line}";
1406                     $diagnostic .= " $problem->{comment}" if $problem->{comment};
1407                 }
1408                 $diagnostic .= "\n";
1409                 $files_with_unknown_issues{$filename} = 1;
1410             } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1411                $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
1412             }
1413             push @diagnostics, $diagnostic if $diagnostic;
1414         }
1415
1416         # The above loop has output messages where there are current potential
1417         # issues.  But it misses where there were some that have been entirely
1418         # fixed.  For those, we need to look through the old issues
1419         foreach my $message ( sort keys %{$known_problems{$canonical}}) {
1420             next if $problems{$filename}{$message};
1421             next if ! $known_problems{$canonical}{$message};
1422             next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1423             my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
1424             push @diagnostics, $diagnostic if $diagnostic;
1425         }
1426
1427         my $output = "POD of $filename";
1428         $output .= ", excluding $total_known not shown known potential problems"
1429                                                                 if $total_known;
1430         ok(@diagnostics == 0, $output);
1431         if (@diagnostics) {
1432             note(join "", @diagnostics,
1433                                 "See end of this test output for your options");
1434         }
1435     }
1436 }
1437
1438 my $how_to = <<EOF;
1439    run this test script by hand, using the following formula (on
1440    Un*x-like machines):
1441         cd t
1442         ./perl -I../lib porting/podcheck.t --regen
1443 EOF
1444
1445 if (%files_with_unknown_issues) {
1446     my $were_count_files = scalar keys %files_with_unknown_issues;
1447     $were_count_files = ($were_count_files == 1)
1448                         ? "was $were_count_files file"
1449                         : "were $were_count_files files";
1450     my $message = <<EOF;
1451
1452 HOW TO GET THIS .t TO PASS
1453
1454 There $were_count_files that had new potential problems identified.  To get
1455 this .t to pass, do the following:
1456
1457 1) If a problem is about a link to an unknown module that you know exists,
1458    simply edit the file,
1459    $known_issues
1460    and add anywhere a line that contains just the module's name.
1461    (Don't do this for a module that you aren't sure about; instead treat
1462    as another type of issue and follow the instructions below.)
1463
1464 2) For other issues, decide if each should be fixed now or not.  Fix the
1465    ones you decided to, and rerun this test to verify that the fixes
1466    worked.
1467
1468 3) If there remain potential problems that you don't plan to fix right
1469    now (or aren't really problems),
1470 $how_to
1471    That should cause all current potential problems to be accepted by the
1472    program, so that the next time it runs, they won't be flagged.
1473 EOF
1474     if (%files_with_fixes) {
1475         $message .= "   This step will also take care of the files that have fixes in them\n";
1476     }
1477
1478     $message .= <<EOF;
1479    For a few files, such as perltoc, certain issues will always be
1480    expected, and more of the same will be added over time.  For those,
1481    before you do the regen, you can edit
1482    $known_issues
1483    and find the entry for the module's file and specific error message,
1484    and change the count of known potential problems to -1.
1485 EOF
1486
1487     note($message);
1488 } elsif (%files_with_fixes) {
1489     note(<<EOF
1490 To teach this test script that the potential problems have been fixed,
1491 $how_to
1492 EOF
1493     );
1494 }
1495
1496 if ($regen) {
1497     chdir $original_dir || die "Can't change directories to $original_dir";
1498     close_and_rename($copy_fh);
1499 }