This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Refactor podcheck.t to slurp files into scalars, instead of an array of lines.
[perl5.git] / t / porting / podcheck.t
CommitLineData
a67b1afa 1#!/usr/bin/perl -w
f7b649f0 2
a67b1afa 3use strict;
16384ac1
KW
4use warnings;
5use feature 'unicode_strings';
6
7use Carp;
8use Digest;
9use File::Find;
10use File::Spec;
11use Scalar::Util;
12use Text::Tabs;
13
14BEGIN {
15 require '../regen/regen_lib.pl';
16}
17
18sub DEBUG { 0 };
19
20=pod
21
22=head1 NAME
23
24podcheck.t - Look for possible problems in the Perl pods
25
26=head1 SYNOPSIS
27
28 cd t
29 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
30 [ FILE ...]
31 ./perl -I../lib porting/podcheck.t --regen
32
33=head1 DESCRIPTION
34
35podcheck.t is an extension of Pod::Checker. It looks for pod errors and
36potential errors in the files given as arguments, or if none specified, in all
37pods in the distribution workspace, except those in the cpan directory (unless
38C<--cpan> is specified). It does additional checking beyond that done by
39Pod::Checker, and keeps a database of known potential problems, and will
40fail a pod only if the number of such problems differs from that given in the
41database. It also suppresses the C<(section) deprecated> message from
42Pod::Checker, since specifying the man page section number is quite proper to do.
43
44The additional checks it makes are:
45
46=over
47
48=item Cross-pod link checking
49
50Pod::Checker verifies that links to an internal target in a pod are not
51broken. podcheck.t extends that (when called without FILE arguments) to
52external links. It does this by gathering up all the possible targets in the
53workspace, and cross-checking them. The database has a list of known targets
54outside the workspace, so podcheck.t will not raise a warning for
55using those. It also checks that a non-broken link points to just one target.
56(The destination pod could have two targets with the same name.)
57
58=item An internal link that isn't so specified
59
60If a link is broken, but there is an existing internal target of the same
61name, it is likely that the internal target was meant, and the C<"/"> is
62missing from the C<LE<lt>E<gt>> pod command.
63
64=item Verbatim paragraphs that wrap in an 80 column window
65
66It's annoying to have lines wrap when displaying pod documentation in a
67terminal window. This checks that all such lines fit, and for those that
68don't, it tells you how much needs to be cut in order to fit. However,
69if you're fixing these, keep in mind that some terminal/pager combinations
70require really a maximum of 79 or 78 columns to display properly.
71
72Often, the easiest thing to do to gain space for these is to lower the indent
73to just one space.
74
75=item Missing or duplicate NAME or missing NAME short description
76
77A pod can't be linked to unless it has a unique name.
78And a NAME should have a dash and short description after it.
79
80=item =encoding statement issues
81
82This indicates if an C<=encoding> statement should be present, or moved to the
83front of the pod.
84
85=item Items that perhaps should be links
86
87There are mentions of apparent files in the pods that perhaps should be links
88instead, using C<LE<lt>...E<gt>>
89
90=item Items that perhaps should be C<FE<lt>...E<gt>>
91
92What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
93C<FE<lt>...E<gt>> mark-up instead.
94
95=back
96
97A number of issues raised by podcheck.t and by the base Pod::Checker are not
98really problems, but merely potential problems. After inspecting them and
99deciding that they aren't real problems, it is possible to shut up this program
100about them, unlike base Pod::Checker. To do this, call podcheck.t with the
101C<--regen> option to regenerate the database. This tells it that all existing
102issues are to not be mentioned again.
103
104This isn't fool-proof. The database merely keeps track of the number of these
105potential problems of each type for each pod. If a new problem of a given
106type is introduced into the pod, podcheck.t will spit out all of them. You
107then have to figure out which is the new one, and should it be changed or not.
108But doing it this way insulates the database from having to keep track of line
109numbers of problems, which may change, or the exact wording of each problem
110which might also change without affecting whether it is a problem or not.
111
112Also, if the count of potential problems of a given type for a pod decreases,
113the database must be regenerated so that it knows the new number. The program
114gives instructions when this happens.
115
116There is currently no check that modules listed as valid in the data base
117actually are. Thus any errors introduced there will remain there.
118
119=head1 OPTIONS
120
121=over
122
123=item --regen
124
125Regenerate the data base used by podcheck.t to include all the existing
126potential problems. Future runs of the program will not then flag any of
127these.
128
129=item --cpan
130
131Normally, all pods in the cpan directory are skipped, except to make sure that
132any blead-upstream links to such pods are valid.
133This option will cause cpan upstream pods to be checked.
134
135=item --show_all
136
137Normally, if the number of potential problems of a given type found for a
138pod matches the expected value in the database, they will not be displayed.
139This option forces the database to be ignored during the run, so all potential
140problems are displayed and will fail their respective pod test. Specifying
141any particular FILES to operate on automatically selects this option.
142
143=item --counts
144
145Instead of testing, this just dumps the counts of the occurrences of the
146various types of potential problems in the data base.
147
148=back
149
150=head1 FILES
151
152The database is stored in F<t/porting/known_pod_issues.dat>
153
154=head1 SEE ALSO
155
156L<Pod::Checker>
157
158=cut
159
160#####################################################
161# HOW IT WORKS (in general)
162#
163# If not called with specific files to check, the directory structure is
164# examined for files that have pods in them. Files that might not have to be
165# fully parsed (e.g. in cpan) are parsed enough at this time to find their
166# pod's NAME, and to get a checksum.
167#
168# Those kinds of files are sorted last, but otherwise the pods are parsed with
169# the package coded here, My::Pod::Checker, which is an extension to
170# Pod::Checker that adds some tests and suppresses others that aren't
171# appropriate. The latter module has no provision for capturing diagnostics,
172# so a package, Tie_Array_to_FH, is used to force them to be placed into an
173# array instead of printed.
174#
175# Parsing the files builds up a list of links. The files are gone through
176# again, doing cross-link checking and outputting all saved-up problems with
177# each pod.
178#
179# Sorting the files last that potentially don't need to be fully parsed allows
180# us to not parse them unless there is a link to an internal anchor in them
181# from something that we have already parsed. Keeping checksums allows us to
182# not parse copies of other pods.
183#
184#####################################################
185
186# 1 => Exclude low priority messages that aren't likely to be problems, and
187# has many false positives; higher numbers give more messages.
188my $Warnings_Level = 200;
189
190# To see if two pods with the same NAME are actually copies of the same pod,
191# which is not an error, it uses a checksum to save work.
192my $digest_type = "SHA-1";
193
194my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
195my $data_dir = File::Spec->catdir($original_dir, 'porting');
196my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
197my $copy_fh;
198
199my $MAX_LINE_LENGTH = 80; # 80 columns
200my $INDENT = 8; # default nroff indent
201
202# Our warning messages. Better not have [('"] in them, as those are used as
203# delimiters for variable parts of the messages by poderror.
204my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
205my $broken_link = "Apparent broken link";
206my $broken_internal_link = "Apparent internal link is missing its forward slash";
207my $see_not_linked = "? Should you be using L<...> instead of";
208my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
209my $multiple_targets = "There is more than one target";
210my $duplicate_name = "Pod NAME already used";
211my $need_encoding = "Should have =encoding statement because have non-ASCII";
212my $encoding_first = "=encoding must be first command (if present)";
213my $no_name = "There is no NAME";
214my $missing_name_description = "The NAME should have a dash and short description after it";
215
216# objects, tests, etc can't be pods, so don't look for them.
217my $non_pods = qr/\.(?:[achot]|zip|gz|bz2|jar|tar|tgz|PL|so)$/;
218
219
220# Pod::Checker messages to suppress
221my @suppressed_messages = (
222 "(section) in", # Checker is wrong to flag this
223 "multiple occurrence of link target", # We catch independently the ones
224 # that are real problems.
225 "unescaped <>",
226);
227
228sub suppressed {
229 # Returns bool as to if input message is one that is to be suppressed
230
231 my $message = shift;
232 return grep { $message =~ /^\Q$_/i } @suppressed_messages;
233}
234
235{ # Closure to contain a simple subset of test.pl. This is to get rid of the
236 # unnecessary 'failed at' messages that would otherwise be output pointing
237 # to a particular line in this file.
a67b1afa 238
16384ac1
KW
239 my $current_test = 0;
240 my $planned;
241
242 sub plan {
243 my %plan = @_;
244 $planned = $plan{tests};
245 print "1..$planned\n";
246 return;
247 }
248
249 sub ok {
250 my $success = shift;
251 my $message = shift;
252
253 chomp $message;
254
255 $current_test++;
256 print "not " unless $success;
257 print "ok $current_test - $message\n";
258 return;
259 }
260
261 sub skip {
262 my $why = shift;
263 my $n = @_ ? shift : 1;
264 for (1..$n) {
265 $current_test++;
266 print "ok $current_test # skip $why\n";
267 }
268 no warnings 'exiting';
269 last SKIP;
270 }
271
272 sub note {
273 my $message = shift;
274
275 chomp $message;
276
277 print $message =~ s/^/# /mgr;
278 print "\n";
279 return;
280 }
281
282 END {
283 if ($planned && $planned != $current_test) {
284 print STDERR
285 "# Looks like you planned $planned tests but ran $current_test.\n";
286 }
287 }
288}
289
b3fdb838
KW
290# This is to get this to work across multiple file systems, including those
291# that are not case sensitive. The db is stored in lower case, and all file
292# name comparisons are done that way.
293sub canonicalize($) {
294 return lc File::Spec->canonpath(shift);
295}
296
16384ac1
KW
297
298# List of known potential problems by pod and type.
299my %known_problems;
300
301# Pods given by the keys contain an interior node that is referred to from
302# outside it.
303my %has_referred_to_node;
304
305my $show_counts = 0;
306my $regen = 0;
307my $show_all = 0;
308
309# Assume that are to skip anything in /cpan
310my $do_upstream_cpan = 0;
311
312while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
313 my $arg = shift @ARGV;
314
315 $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
316 if ($arg eq '-regen') {
317 $regen = 1;
318 }
319 elsif ($arg eq '-cpan') {
320 $do_upstream_cpan = 1;
321 }
322 elsif ($arg eq '-show_all') {
323 $show_all = 1;
324 }
325 elsif ($arg eq '-counts') {
326 $show_counts = 1;
327 }
328 else {
329 die <<EOF;
330Unknown option '$arg'
331
332Usage: $0 [ --regen | --cpan | --show_all ] [ FILE ... ]\n"
333 --cpan -> Include files in the cpan subdirectory.
334 --regen -> Regenerate the data file for $0
335 --show_all -> Show all known potential problems
336 --counts -> Don't test, but give summary counts of the currently
337 existing database
338EOF
339 }
340}
341
342my @files = @ARGV;
343
344if (($regen + $show_all + $show_counts + $do_upstream_cpan) > 1) {
345 croak "--regen, --show_all, --cpan, and --counts are mutually exclusive";
346}
347
348my $has_input_files = @files;
349
350if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
351 croak "--regen, --counts and --cpan can't be used since using specific files";
352}
353
354our %problems; # potential problems found in this run
355
356package My::Pod::Checker { # Extend Pod::Checker
a67b1afa
MM
357 use parent 'Pod::Checker';
358
16384ac1
KW
359 # Uses inside out hash to protect from typos
360 # For new fields, remember to add to destructor DESTROY()
361 my %indents; # Stack of indents from =over's in effect for
362 # current line
363 my %current_indent; # Current line's indent
364 my %filename; # The pod is store in this file
365 my %skip; # is SKIP set for this pod
366 my %in_NAME; # true if within NAME section
367 my %in_begin; # true if within =begin section
368 my %linkable_item; # Bool: if the latest =item is linkable. It isn't
369 # for bullet and number lists
370 my %linkable_nodes; # Pod::Checker adds all =items to its node list,
371 # but not all =items are linkable to
372 my %seen_encoding_cmd; # true if have =encoding earlier
373 my %command_count; # Number of commands seen
374 my %seen_pod_cmd; # true if have =pod earlier
375 my %warned_encoding; # true if already have warned about =encoding
376 # problems
377
378 sub DESTROY {
379 my $addr = Scalar::Util::refaddr $_[0];
380 delete $command_count{$addr};
381 delete $current_indent{$addr};
382 delete $filename{$addr};
383 delete $in_begin{$addr};
384 delete $indents{$addr};
385 delete $in_NAME{$addr};
386 delete $linkable_item{$addr};
387 delete $linkable_nodes{$addr};
388 delete $seen_encoding_cmd{$addr};
389 delete $seen_pod_cmd{$addr};
390 delete $skip{$addr};
391 delete $warned_encoding{$addr};
392 return;
393 }
394
395 sub new {
396 my $class = shift;
397 my $filename = shift;
398
399 my $self = $class->SUPER::new(-quiet => 1,
400 -warnings => $Warnings_Level);
401 my $addr = Scalar::Util::refaddr $self;
402 $command_count{$addr} = 0;
403 $current_indent{$addr} = 0;
404 $filename{$addr} = $filename;
405 $in_begin{$addr} = 0;
406 $in_NAME{$addr} = 0;
407 $linkable_item{$addr} = 0;
408 $seen_encoding_cmd{$addr} = 0;
409 $seen_pod_cmd{$addr} = 0;
410 $warned_encoding{$addr} = 0;
411 return $self;
412 }
413
414 # re's for messages that Pod::Checker outputs
415 my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
416 my $optional_location = qr/ (?: $location )? /xi;
3cb68c65 417 my $line_reference = qr/ [('"]? $optional_location \b line \s+
16384ac1
KW
418 (?: \d+ | EOF | \Q???\E | - )
419 [)'"]? /xi;
420
421 sub poderror { # Called to register a potential problem
422
423 # This adds an extra field to the parent hash, 'parameter'. It is
424 # used to extract the variable parts of a message leaving just the
425 # constant skeleton. This in turn allows the message to be
426 # categorized better, so that it shows up as a single type in our
427 # database, with the specifics of each occurrence not being stored with
428 # it.
429
430 my $self = shift;
431 my $opts = shift;
432
433 my $addr = Scalar::Util::refaddr $self;
434 return if $skip{$addr};
435
436 # Input can be a string or hash. If a string, parse it to separate
437 # out the line number and convert to a hash for easier further
438 # processing
439 my $message;
440 if (ref $opts ne 'HASH') {
441 $message = join "", $opts, @_;
442 my $line_number;
443 if ($message =~ s/\s*($line_reference)//) {
444 ($line_number = $1) =~ s/\s*$optional_location//;
445 }
446 else {
447 $line_number = '???';
448 }
449 $opts = { -msg => $message, -line => $line_number };
450 } else {
451 $message = $opts->{'-msg'};
452
453 }
454
455 $message =~ s/^\d+\s+//;
456 return if main::suppressed($message);
457
458 $self->SUPER::poderror($opts, @_);
459
460 $opts->{parameter} = "" unless $opts->{parameter};
461
462 # The variable parts of the message tend to be enclosed in '...',
463 # "....", or (...). Extract them and put them in an extra field,
464 # 'parameter'. This is trickier because the matching delimiter to a
465 # '(' is its mirror, and not itself. Text::Balanced could be used
466 # instead.
467 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
468 my $delimiter = $1;
469 my $start = $-[0];
470 $delimiter = ')' if $delimiter eq '(';
471
472 # If there is no ending delimiter, don't consider it to be a
473 # variable part. Most likely it is a contraction like "Don't"
474 last unless $message =~ m/\G .+? \Q$delimiter/xg;
475
476 my $length = $+[0] - $start;
477
478 # Get the part up through the closing delimiter
479 my $special = substr($message, $start, $length);
480 $special =~ s/^\s+//; # No leading whitespace
481
482 # And add that variable part to the parameter, while removing it
483 # from the message. This isn't a foolproof way of finding the
484 # variable part. For example '(s)' can occur in e.g.,
485 # 'paragraph(s)'
486 if ($special ne '(s)') {
487 substr($message, $start, $length) = "";
488 pos $message = $start;
489 $opts->{-msg} = $message;
490 $opts->{parameter} .= " " if $opts->{parameter};
491 $opts->{parameter} .= $special;
492 }
493 }
494
495 # Extract any additional line number given. This is often the
496 # beginning location of something whereas the main line number gives
497 # the ending one.
498 if ($message =~ /( $line_reference )/xi) {
499 my $line_ref = $1;
500 while ($message =~ s/\s*\Q$line_ref//) {
501 $opts->{-msg} = $message;
502 $opts->{parameter} .= " " if $opts->{parameter};
503 $opts->{parameter} .= $line_ref;
504 }
505 }
506
b3fdb838 507 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
16384ac1
KW
508 push @{$problems{$filename{$addr}}{$message}}, $opts;
509 #push @{$problems{$self->get_filename}{$message}}, $opts;
510 }
511
512 sub check_encoding { # Does it need an =encoding statement?
513 my ($self, $paragraph, $line_num, $pod_para) = @_;
514
515 # Do nothing if there is an =encoding in the file, or if the line
516 # doesn't require an =encoding, or have already warned.
517 my $addr = Scalar::Util::refaddr $self;
518 return if $seen_encoding_cmd{$addr}
519 || $warned_encoding{$addr}
520 || $paragraph !~ /\P{ASCII}/;
521
522 $warned_encoding{$addr} = 1;
523 my ($file, $line) = $pod_para->file_line;
524 $self->poderror({ -line => $line, -file => $file,
525 -msg => $need_encoding
526 });
527 return;
528 }
529
530 sub verbatim {
531 my ($self, $paragraph, $line_num, $pod_para) = @_;
532 $self->check_encoding($paragraph, $line_num, $pod_para);
533
534 $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
535
536 # Pick up the name, since the parent class doesn't in verbatim
537 # NAMEs; so treat as non-verbatim. The parent class only allows one
538 # paragraph in a NAME section, so if there is an extra blank line, it
539 # will trigger a message, but such a blank line is harmless, so skip
540 # in that case.
541 if ($in_NAME{Scalar::Util::refaddr $self} && $paragraph =~ /\S/) {
542 $self->textblock($paragraph, $line_num, $pod_para);
543 }
544
545 my @lines = split /^/, $paragraph;
546 for my $i (0 .. @lines - 1) {
547 $lines[$i] =~ s/\s+$//;
548 my $indent = $self->get_current_indent;
549 my $exceeds = length(Text::Tabs::expand($lines[$i]))
550 + $indent - $MAX_LINE_LENGTH;
551 next unless $exceeds > 0;
552 my ($file, $line) = $pod_para->file_line;
553 $self->poderror({ -line => $line + $i, -file => $file,
554 -msg => $line_length,
555 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
556 });
557 }
558 }
559
560 sub textblock {
561 my ($self, $paragraph, $line_num, $pod_para) = @_;
562 $self->check_encoding($paragraph, $line_num, $pod_para);
563
564 $self->SUPER::textblock($paragraph, $line_num, $pod_para);
565
566 my ($file, $line) = $pod_para->file_line;
567 my $addr = Scalar::Util::refaddr $self;
568 if ($in_NAME{$addr}) {
569 if (! $self->name) {
570 my $text = $self->interpolate($paragraph, $line_num);
571 if ($text =~ /^\s*(\S+?)\s*$/) {
572 $self->name($1);
573 $self->poderror({ -line => $line, -file => $file,
574 -msg => $missing_name_description,
575 parameter => $1});
576 }
577 }
578 }
579 $paragraph = join " ", split /^/, $paragraph;
580
581 # Matches something that looks like a file name, but is enclosed in
582 # C<...>
583 my $C_path_re = qr{ \b ( C<
584 # exclude regexes and 'OS/2'
585 (?! (?: (?: s | qr | m) / ) | OS/2 > )
586 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
587 }x;
588
589 # If looks like a reference to other documentation by containing the
590 # word 'See' and then a likely pod directive, warn.
591
592 while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
593 ( [^<]*? ) # The not-< excludes nested C<L<...
594 > ) }ixg) {
595 my $construct = $1;
596 my $type = $2;
597 my $interior = $3;
598 if ($interior !~ /$non_pods/
599 && $construct !~ /$C_path_re/g) {
600 $self->poderror({ -line => $line, -file => $file,
601 -msg => $see_not_linked,
602 parameter => $construct
603 });
604 }
605 }
606 while ($paragraph =~ m/$C_path_re/g) {
607 my $construct = $1;
608 $self->poderror({ -line => $line, -file => $file,
609 -msg => $C_with_slash,
610 parameter => $construct
611 });
612 }
613 return;
614 }
615
616 sub command {
617 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
618 my $addr = Scalar::Util::refaddr $self;
619 if ($cmd eq "pod") {
620 $seen_pod_cmd{$addr}++;
621 }
622 elsif ($cmd eq "encoding") {
623 my ($file, $line) = $pod_para->file_line;
624 $seen_encoding_cmd{$addr} = 1;
625 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
626 $self->poderror({ -line => $line, -file => $file,
627 -msg => $encoding_first
628 });
629 }
630 }
631 $self->check_encoding($paragraph, $line_num, $pod_para);
632
633 # Pod::Check treats all =items as linkable, but the bullet and
634 # numbered lists really aren't. So keep our own list. This has to be
635 # processed before SUPER is called so that the list is started before
636 # the rest of it gets parsed.
637 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
638 $linkable_item{$addr} = ($paragraph !~ / ^ \s*
639 (?: [*]
640 | \d+ \.? (?: \$ | \s+ )
641 )/x)
642 ? 1
643 : 0;
a67b1afa 644
16384ac1
KW
645 }
646 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
647
648 $command_count{$addr}++;
649
650 $in_NAME{$addr} = 0; # Will change to 1 below if necessary
651 $in_begin{$addr} = 0; # ibid
652 if ($cmd eq 'over') {
653 my $text = $self->interpolate($paragraph, $line_num);
654 my $indent = 4; # default
655 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
656 push @{$indents{$addr}}, $indent;
657 $current_indent{$addr} += $indent;
658 }
659 elsif ($cmd eq 'back') {
660 if (@{$indents{$addr}}) {
661 $current_indent{$addr} -= pop @{$indents{$addr}};
662 }
663 else {
664 # =back without corresponding =over, but should have
665 # warned already
666 $current_indent{$addr} = 0;
667 }
668 }
669 elsif ($cmd =~ /^head/) {
670 if (! $in_begin{$addr}) {
671
672 # If a particular formatter, then this command doesn't really
673 # apply
674 $current_indent{$addr} = 0;
675 undef @{$indents{$addr}};
676 }
677
678 my $text = $self->interpolate($paragraph, $line_num);
679 $in_NAME{$addr} = 1 if $cmd eq 'head1'
680 && $text && $text =~ /^NAME\b/;
681 }
682 elsif ($cmd eq 'begin') {
683 $in_begin{$addr} = 1;
684 }
685
686 return;
687 }
688
689 sub hyperlink {
690 my $self = shift;
691
692 # If the hyperlink is to an interior node of another page, save it
693 # so that we can see if we need to parse normally skipped files.
694 $has_referred_to_node{$_[0][1]{'-page'}} = 1
695 if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
696 return $self->SUPER::hyperlink($_[0]);
697 }
698
699 sub node {
700 my $self = shift;
701 my $text = $_[0];
702 if($text) {
703 $text =~ s/\s+$//s; # strip trailing whitespace
704 $text =~ s/\s+/ /gs; # collapse whitespace
705 my $addr = Scalar::Util::refaddr $self;
706 push(@{$linkable_nodes{$addr}}, $text) if
707 ! $current_indent{$addr}
708 || $linkable_item{$addr};
709 }
710 return $self->SUPER::node($_[0]);
711 }
712
713 sub get_current_indent {
714 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
715 }
716
717 sub get_filename {
718 return $filename{Scalar::Util::refaddr $_[0]};
719 }
720
721 sub linkable_nodes {
722 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
723 return undef unless $linkables;
724 return @$linkables;
725 }
726
727 sub get_skip {
728 return $skip{Scalar::Util::refaddr $_[0]} // 0;
729 }
730
731 sub set_skip {
732 my $self = shift;
733 $skip{Scalar::Util::refaddr $self} = shift;
734
735 # If skipping, no need to keep the problems for it
736 delete $problems{$self->get_filename};
737 return;
738 }
739}
740
741package Tie_Array_to_FH { # So printing actually goes to an array
742
743 my %array;
744
745 sub TIEHANDLE {
746 my $class = shift;
747 my $array_ref = shift;
748
749 my $self = bless \do{ my $anonymous_scalar }, $class;
750 $array{Scalar::Util::refaddr $self} = $array_ref;
751
752 return $self;
753 }
754
755 sub PRINT {
a67b1afa 756 my $self = shift;
16384ac1
KW
757 push @{$array{Scalar::Util::refaddr $self}}, @_;
758 return 1;
759 }
a67b1afa
MM
760}
761
69f6a9a1 762
16384ac1
KW
763my %filename_to_checker; # Map a filename to it's pod checker object
764my %id_to_checker; # Map a checksum to it's pod checker object
765my %nodes; # key is filename, values are nodes in that file.
766my %nodes_first_word; # same, but value is first word of each node
767my %valid_modules; # List of modules known to exist outside us.
768my %digests; # checksums of files, whose names are the keys
769my %filename_to_pod; # Map a filename to its pod NAME
770my %files_with_unknown_issues;
771my %files_with_fixes;
69f6a9a1 772
16384ac1
KW
773my $data_fh;
774open($data_fh, $known_issues) || die "Can't open $known_issues";
69f6a9a1 775
16384ac1
KW
776my %counts; # For --counts param, count of each issue type
777my %suppressed_files; # Files with at least one issue type to suppress
778
779while (<$data_fh>) { # Read the data base
69f6a9a1 780 chomp;
16384ac1
KW
781 next if /^\s*(?:#|$)/; # Skip comment and empty lines
782 if (/\t/) {
783 next if $show_all;
784
785 # Keep track of counts of each issue type for each file
786 my ($filename, $message, $count) = split /\t/;
787 $known_problems{$filename}{$message} = $count;
788
789 if ($show_counts) {
790 if ($count < 0) { # -1 means to suppress this issue type
791 $suppressed_files{$filename} = $filename;
792 }
793 else {
794 $counts{$message} += $count;
795 }
796 }
797 }
798 else { # Lines without a tab are modules known to be valid
799 $valid_modules{$_} = 1
800 }
801}
802close $data_fh;
803
804if ($show_counts) {
805 my $total = 0;
806 foreach my $message (sort keys %counts) {
807 $total += $counts{$message};
808 note(Text::Tabs::expand("$counts{$message}\t$message"));
809 }
810 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
811 if (%suppressed_files) {
812 note("\nFiles that have all messages of at least one type suppressed:");
813 note(join ",", keys %suppressed_files);
814 }
815 exit 0;
816}
817
818
819my %excluded_files = (
820 "lib/unicore/mktables" => 1,
821 "Porting/perldelta_template.pod" => 1,
822 "autodoc.pl" => 1,
823 "configpm" => 1,
824 "miniperl" => 1,
825 "perl" => 1,
e678c294
KW
826
827 # It would be nice if we didn't have to skip this,
828 # but the errors in it are too variable.
829 "pod/perltoc.pod" => 1,
16384ac1
KW
830 );
831
b3fdb838
KW
832# Convert to more generic form.
833foreach my $file (keys %excluded_files) {
834 $excluded_files{canonicalize($excluded_files{$file})}
835 = $excluded_files{$file};
836}
837
16384ac1
KW
838# re to match files that are to be parsed only if there is an internal link
839# to them. It does not include cpan, as whether those are parsed depends
840# on a switch. Currently, only the stable perldelta.pod's are included.
841# These all have characters between 'perl' and 'delta'. (Actually the
842# currently developed one matches as well, but is a duplicate of
843# perldelta.pod, so can be skipped, so fine for it to match this.
844my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
845
846{ # Closure
847 my $first_time = 1;
848
849 sub output_thanks ($$$$) { # Called when an issue has been fixed
850 my $filename = shift;
851 my $original_count = shift;
852 my $current_count = shift;
853 my $message = shift;
854
855 $files_with_fixes{$filename} = 1;
856 my $return;
857 my $fixed_count = $original_count - $current_count;
858 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
859 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
860 my $diff;
861 if ($message) {
862 $diff = <<EOF;
863There were $original_count occurrences (now $current_count) in this pod of type
864"$message",
865EOF
866 } else {
867 $diff = <<EOF;
868There are no longer any problems found in this pod!
869EOF
870 }
871
872 if ($first_time) {
873 $first_time = 0;
874 $return = <<EOF;
875Thanks for fixing $a_problem!
876$diff
877Now you must teach $0 that this was fixed.
878EOF
879 }
880 else {
881 $return = <<EOF
882Thanks for fixing $another_problem.
883$diff
884EOF
885 }
886
887 return $return;
888 }
889}
890
891sub my_safer_print { # print, with error checking for outputting to db
892 my ($fh, @lines) = @_;
893
894 if (! print $fh @lines) {
895 my $save_error = $!;
896 close($fh);
897 die "Write failure: $save_error";
898 }
899}
900
901sub extract_pod { # Extracts just the pod from a file
902 my $filename = shift;
903
904 my @pod;
905
906 # Arrange for the output of Pod::Parser to be collected in an array we can
907 # look at instead of being printed
908 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
909 open my $in_fh, '<', $filename
910 or die "Can't open '$filename': $!\n";
911
912 my $parser = Pod::Parser->new();
913 $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
914 close $in_fh;
915
916 return join "", @pod
917}
918
919my $digest = Digest->new($digest_type);
920
921sub is_pod_file {
922 if (-d $_) {
923 # Don't look at files in directories that are for tests, nor those
924 # beginning with a dot
925 if ($_ eq 't' || $_ =~ /^\../) {
926 $File::Find::prune = 1;
927 }
928 return;
929 }
930
931 return if $_ =~ /^\./; # No hidden Unix files
932 return if $_ =~ $non_pods;
933
934 my $filename = $File::Find::name;
935
936 # Assumes that the path separator is exactly one character.
937 $filename =~ s/^\..//;
938
b3fdb838 939 return if $excluded_files{canonicalize($filename)};
16384ac1 940
144a708b
NC
941 my $contents = do {
942 local $/;
943 open my $candidate, '<', $_
944 or die "Can't open '$File::Find::name': $!\n";
945 <$candidate>;
946 };
16384ac1
KW
947
948 # If the file is a .pm or .pod, having any initial '=' on a line is
949 # grounds for testing it. Otherwise, require a head1 NAME line to view it
950 # as a potential pod
144a708b
NC
951 if ($filename =~ /\.(?:pm|pod)/) {
952 return unless $contents =~ /^=/m;
953 } else {
954 return unless $contents =~ /^=head1 +NAME/m;
16384ac1 955 }
144a708b
NC
956
957 # Here, we know that the file is a pod. Add it to the list of files
958 # to check and create a checker object for it.
959
960 push @files, $filename;
961 my $checker = My::Pod::Checker->new($filename);
962 $filename_to_checker{$filename} = $checker;
963
964 # In order to detect duplicate pods and only analyze them once, we
965 # compute checksums for the file, so don't have to do an exact
966 # compare. Note that if the pod is just part of the file, the
967 # checksums can differ for the same pod. That special case is handled
968 # later, since if the checksums of the whole file are the same, that
969 # case won't even come up. We don't need the checksums for files that
970 # we parse only if there is a link to its interior, but we do need its
971 # NAME, which is also retrieved in the code below.
972
973 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
974 | $only_for_interior_links_re
975 /x) {
976 $digest->add($contents);
977 $digests{$filename} = $digest->digest;
978
979 # lib files aren't analyzed if they are duplicates of files copied
980 # there from some other directory. But to determine this, we need
981 # to know their NAMEs. We might as well find the NAME now while
982 # the file is open. Similarly, cpan files aren't analyzed unless
983 # we're analyzing all of them, or this particular file is linked
984 # to by a file we are analyzing, and thus we will want to verify
985 # that the target exists in it. We need to know at least the NAME
986 # to see if it's worth analyzing, or so we can determine if a lib
987 # file is a copy of a cpan one.
988 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
16384ac1 989 | $only_for_interior_links_re
144a708b
NC
990 }x) {
991 if ($contents =~ /^=head1 +NAME.*/mg) {
992 # The NAME is the first non-spaces on the line up to a
993 # comma, dash or end of line. Otherwise, it's invalid and
994 # this pod doesn't have a legal name that we're smart
995 # enough to find currently. But the parser will later
996 # find it if it thinks there is a legal name, and set the
997 # name
998 if ($contents =~ /\G # continue from the line after =head1
999 \s* # ignore any empty lines
1000 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1001 my $name = $1;
1002 $checker->name($name);
1003 $id_to_checker{$name} = $checker
1004 if $filename =~ m{^cpan/};
16384ac1
KW
1005 }
1006 }
144a708b
NC
1007 elsif ($filename =~ m{^cpan/}) {
1008 $id_to_checker{$digests{$filename}} = $checker;
1009 }
16384ac1
KW
1010 }
1011 }
1012} # End of is_pod_file()
1013
1014# Start of real code that isn't processing the command line.
1015# Here, @files contains list of files on the command line. If have any of
1016# these, unconditionally test them, and show all the errors, even the known
1017# ones, and, since not testing other pods, don't do cross-pod link tests.
1018# (Could add extra code to do cross-pod tests for the ones in the list.)
1019if ($has_input_files) {
1020 undef %known_problems;
1021 $do_upstream_cpan = 1; # In case one of the inputs is from cpan
1022
1023}
1024else { # No input files -- go find all the possibilities.
1025 if ($regen) {
1026 $copy_fh = open_new($known_issues);
1027 note("Regenerating $known_issues, please be patient...");
1028 print $copy_fh <<END;
1029# This file is the data file for $0.
1030# There are three types of lines.
1031# Comment lines are white-space only or begin with a '#', like this one. Any
1032# changes you make to the comment lines will be lost when the file is
1033# regen'd.
1034# Lines without tab characters are simply NAMES of pods that the program knows
1035# will have links to them and the program does not check if those links are
1036# valid.
1037# All other lines should have three fields, each separated by a tab. The
1038# first field is the name of a pod; the second field is an error message
1039# generated by this program; and the third field is a count of how many
1040# known instances of that message there are in the pod. -1 means that the
1041# program can expect any number of this type of message.
1042END
1043 }
1044
1045 # Move to the directory above us, but have to adjust @INC to account for
1046 # that.
1047 s{^\.\./lib$}{lib} for @INC;
1048 chdir File::Spec->updir;
1049
1050 # And look in this directory and all its subdirectories
1051 find( \&is_pod_file, '.');
1052
1053 # Add ourselves to the test
b3fdb838 1054 push @files, "t/porting/podcheck.t";
16384ac1
KW
1055}
1056
1057# Now we know how many tests there will be.
1058plan (tests => scalar @files) if ! $regen;
1059
1060
1061 # Sort file names so we get consistent results, and to put cpan last,
1062 # preceeded by the ones that we don't generally parse. This is because both
1063 # these classes are generally parsed only if there is a link to the interior
1064 # of them, and we have to parse all others first to guarantee that they don't
1065 # have such a link. 'lib' files come just before these, as some of these are
1066 # duplicates of others. We already have figured this out when gathering the
1067 # data as a special case for all such files, but this, while unnecessary,
1068 # puts the derived file last in the output. 'readme' files come before those,
1069 # as those also could be duplicates of others, which are considered the
1070 # primary ones. These currently aren't figured out when gathering data, so
1071 # are done here.
1072 @files = sort { if ($a =~ /^cpan/) {
1073 return 1 if $b !~ /^cpan/;
1074 return $a cmp $b;
1075 }
1076 elsif ($b =~ /^cpan/) {
1077 return -1;
1078 }
1079 elsif ($a =~ /$only_for_interior_links_re/) {
1080 return 1 if $b !~ /$only_for_interior_links_re/;
1081 return $a cmp $b;
1082 }
1083 elsif ($b =~ /$only_for_interior_links_re/) {
1084 return -1;
1085 }
1086 elsif ($a =~ /^lib/) {
1087 return 1 if $b !~ /^lib/;
1088 return $a cmp $b;
1089 }
1090 elsif ($b =~ /^lib/) {
1091 return -1;
1092 } elsif ($a =~ /\breadme\b/i) {
1093 return 1 if $b !~ /\breadme\b/i;
1094 return $a cmp $b;
1095 }
1096 elsif ($b =~ /\breadme\b/i) {
1097 return -1;
1098 }
1099 else {
1100 return lc $a cmp lc $b;
1101 }
1102 }
1103 @files;
1104
1105# Now go through all the files and parse them
1106foreach my $filename (@files) {
1107 my $parsed = 0;
1108 note("parsing $filename") if DEBUG;
1109
1110 # We may have already figured out some things in the process of generating
1111 # the file list. If so, have a $checker object already. But if not,
1112 # generate one now.
1113 my $checker = $filename_to_checker{$filename};
1114 if (! $checker) {
1115 $checker = My::Pod::Checker->new($filename);
1116 $filename_to_checker{$filename} = $checker;
1117 }
1118
1119 # We have set the name in the checker object if there is a possibility
1120 # that no further parsing is necessary, but otherwise do the parsing now.
1121 if (! $checker->name) {
1122 $parsed = 1;
1123 $checker->parse_from_file($filename, undef);
1124 }
1125
1126 if ($checker->num_errors() < 0) { # Returns negative if not a pod
1127 $checker->set_skip("$filename is not a pod");
1128 }
1129 else {
1130
1131 # Here, is a pod. See if it is one that has already been tested,
1132 # or should be tested under another directory. Use either its NAME
1133 # if it has one, or a checksum if not.
1134 my $name = $checker->name;
1135 my $id;
1136
1137 if ($name) {
1138 $id = $name;
1139 }
1140 else {
1141 my $digest = Digest->new($digest_type);
1142 $digest->add(extract_pod($filename));
1143 $id = $digest->digest;
1144 }
1145
1146 # If there is a match for this pod with something that we've already
1147 # processed, don't process it, and output why.
1148 my $prior_checker;
1149 if (defined ($prior_checker = $id_to_checker{$id})
1150 && $prior_checker != $checker) # Could have defined the checker
1151 # earlier without pursuing it
1152 {
1153
1154 # If the pods are identical, then it's just a copy, and isn't an
1155 # error. First use the checksums we have already computed to see
1156 # if the entire files are identical, which means that the pods are
1157 # identical too.
1158 my $prior_filename = $prior_checker->get_filename;
1159 my $same = (! $name
1160 || ($digests{$prior_filename}
1161 && $digests{$filename}
1162 && $digests{$prior_filename} eq $digests{$filename}));
1163
1164 # If they differ, it could be that the files differ for some
1165 # reason, but the pods they contain are identical. Extract the
1166 # pods and do the comparisons on just those.
1167 if (! $same && $name) {
1168 $same = extract_pod($prior_filename) eq extract_pod($filename);
1169 }
1170
1171 if ($same) {
1172 $checker->set_skip("The pod of $filename is a duplicate of "
1173 . "the pod for $prior_filename");
1174 } elsif ($prior_filename =~ /\breadme\b/i) {
1175 $checker->set_skip("$prior_filename is a README apparently for $filename");
1176 } elsif ($filename =~ /\breadme\b/i) {
1177 $checker->set_skip("$filename is a README apparently for $prior_filename");
1178 } else { # Here have two pods with identical names that differ
1179 $prior_checker->poderror(
1180 { -msg => $duplicate_name,
1181 -line => "???",
1182 parameter => "'$filename' also has NAME '$name'"
1183 });
1184 $checker->poderror(
1185 { -msg => $duplicate_name,
1186 -line => "???",
1187 parameter => "'$prior_filename' also has NAME '$name'"
1188 });
1189
1190 # Changing the names helps later.
1191 $prior_checker->name("$name version arbitrarily numbered 1");
1192 $checker->name("$name version arbitrarily numbered 2");
1193 }
1194
1195 # In any event, don't process this pod that has the same name as
1196 # another.
1197 next;
1198 }
1199
1200 # A unique pod.
1201 $id_to_checker{$id} = $checker;
1202
1203 my $parsed_for_links = ", but parsed for its interior links";
1204 if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1205 || $filename =~ $only_for_interior_links_re)
1206 {
1207 if ($filename =~ /^cpan/) {
1208 $checker->set_skip("CPAN is upstream for $filename");
1209 }
1210 elsif ($filename =~ /perl\d+delta/) {
1211 $checker->set_skip("$filename is a stable perldelta");
1212 }
1213 else {
1214 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1215 }
1216
1217 if ($name && $has_referred_to_node{$name}) {
1218 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1219 }
1220 }
1221
1222 # Need a name in order to process it, because not meaningful
1223 # otherwise, and also can't test links to this without a name.
1224 if (!defined $name) {
1225 $checker->poderror( { -msg => $no_name,
1226 -line => '???'
1227 });
1228 next;
1229 }
1230
1231 # For skipped files, just get its NAME
1232 my $skip;
1233 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1234 {
1235 $checker->node($name) if $name;
1236 }
1237 else {
1238 $checker->parse_from_file($filename, undef) if ! $parsed;
1239 }
1240
1241 # Go through everything in the file that could be an anchor that
1242 # could be a link target. Count how many there are of the same name.
1243 foreach my $node ($checker->linkable_nodes) {
1244 next if ! $node; # Can be empty is like '=item *'
1245 if (exists $nodes{$name}{$node}) {
1246 $nodes{$name}{$node}++;
1247 }
1248 else {
1249 $nodes{$name}{$node} = 1;
1250 }
1251
1252 # Experiments have shown that cpan search can figure out the
1253 # target of a link even if the exact wording is incorrect, as long
1254 # as the first word is. This happens frequently in perlfunc.pod,
1255 # where the link will be just to the function, but the target
1256 # entry also includes parameters to the function.
1257 my $first_word = $node;
1258 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1259 $nodes_first_word{$name}{$first_word} = $node;
1260 }
1261 }
1262 $filename_to_pod{$filename} = $name;
1263 }
1264}
1265
1266# Here, all files have been parsed, and all links and link targets are stored.
1267# Now go through the files again and see which don't have matches.
1268if (! $has_input_files) {
1269 foreach my $filename (@files) {
1270 next if $filename_to_checker{$filename}->get_skip;
1271 my $checker = $filename_to_checker{$filename};
1272 foreach my $link ($checker->hyperlink) {
1273 my $linked_to_page = $link->[1]->page;
1274 next unless $linked_to_page; # intra-file checks are handled by std
1275 # Pod::Checker
1276
1277 # Initialize the potential message.
1278 my %problem = ( -msg => $broken_link,
1279 -line => $link->[0],
1280 parameter => "to \"$linked_to_page\"",
1281 );
1282
1283 # See if we have found the linked-to_file in our parse
1284 if (exists $nodes{$linked_to_page}) {
1285 my $node = $link->[1]->node;
1286
1287 # If link is only to the page-level, already have it
1288 next if ! $node;
1289
1290 # Transform pod language to what we are expecting
1291 $node =~ s,E<sol>,/,g;
1292 $node =~ s/E<verbar>/|/g;
1293
1294 # If link is to a node that exists in the file, is ok
1295 if ($nodes{$linked_to_page}{$node}) {
1296
1297 # But if the page has multiple targets with the same name,
1298 # it's ambiguous which one this should be to.
1299 if ($nodes{$linked_to_page}{$node} > 1) {
1300 $problem{-msg} = $multiple_targets;
1301 $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1302 $checker->poderror(\%problem);
1303 }
1304 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1305
1306 # Here the link target was not found, either exactly or to
1307 # the first word. Is an error.
1308 $problem{parameter} =~ s,"$,/$node",;
1309 $checker->poderror(\%problem);
1310 }
1311
1312 } # Linked-to-file not in parse; maybe is in exception list
1313 elsif (! exists $valid_modules{$link->[1]->page}) {
1314
1315 # Here, is a link to a target that we can't find. Check if
1316 # there is an internal link on the page with the target name.
1317 # If so, it could be that they just forgot the initial '/'
1318 if ($filename_to_pod{$filename}
1319 && $nodes{$filename_to_pod{$filename}}{$linked_to_page})
1320 {
1321 $problem{-msg} = $broken_internal_link;
1322 }
1323 $checker->poderror(\%problem);
1324 }
1325 }
1326 }
1327}
1328
1329# If regenerating the data file, start with the modules for which we don't
1330# check targets
1331if ($regen) {
1332 foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1333 my_safer_print($copy_fh, $_, "\n");
1334 }
1335}
1336
1337# Now ready to output the messages.
1338foreach my $filename (@files) {
1339 my $test_name = "POD of $filename";
b3fdb838 1340 my $canonical = canonicalize($filename);
16384ac1
KW
1341 SKIP: {
1342 my $skip = $filename_to_checker{$filename}->get_skip // "";
1343
1344 if ($regen) {
1345 foreach my $message ( sort keys %{$problems{$filename}}) {
1346 my $count;
1347
1348 # Preserve a negative setting.
b3fdb838
KW
1349 if ($known_problems{$canonical}{$message}
1350 && $known_problems{$canonical}{$message} < 0)
16384ac1 1351 {
b3fdb838 1352 $count = $known_problems{$canonical}{$message};
16384ac1
KW
1353 }
1354 else {
1355 $count = @{$problems{$filename}{$message}};
1356 }
b3fdb838 1357 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
16384ac1
KW
1358 }
1359 next;
1360 }
1361
1362 skip($skip, 1) if $skip;
1363 my @diagnostics;
1364 my $indent = ' ';
1365
16384ac1
KW
1366 my $total_known = 0;
1367 foreach my $message ( sort keys %{$problems{$filename}}) {
b3fdb838
KW
1368 $known_problems{$canonical}{$message} = 0
1369 if ! $known_problems{$canonical}{$message};
16384ac1
KW
1370 my $diagnostic = "";
1371 my $problem_count = scalar @{$problems{$filename}{$message}};
1372 $total_known += $problem_count;
b3fdb838
KW
1373 next if $known_problems{$canonical}{$message} < 0;
1374 if ($problem_count > $known_problems{$canonical}{$message}) {
16384ac1
KW
1375
1376 # Here we are about to output all the messages for this type,
1377 # subtract back this number we previously added in.
1378 $total_known -= $problem_count;
1379
1380 $diagnostic .= $indent . $message;
1381 if ($problem_count > 2) {
1382 $diagnostic .= " ($problem_count occurrences)";
1383 }
1384 foreach my $problem (@{$problems{$filename}{$message}}) {
1385 $diagnostic .= " " if $problem_count == 1;
1386 $diagnostic .= "\n$indent$indent";
1387 $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1388 $diagnostic .= " near line $problem->{-line}";
1389 $diagnostic .= " $problem->{comment}" if $problem->{comment};
1390 }
1391 $diagnostic .= "\n";
1392 $files_with_unknown_issues{$filename} = 1;
b3fdb838
KW
1393 } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1394 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
16384ac1
KW
1395 }
1396 push @diagnostics, $diagnostic if $diagnostic;
1397 }
1398
09ea063a
KW
1399 # The above loop has output messages where there are current potential
1400 # issues. But it misses where there were some that have been entirely
1401 # fixed. For those, we need to look through the old issues
b3fdb838 1402 foreach my $message ( sort keys %{$known_problems{$canonical}}) {
09ea063a 1403 next if $problems{$filename}{$message};
b3fdb838
KW
1404 next if ! $known_problems{$canonical}{$message};
1405 next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1406 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
09ea063a
KW
1407 push @diagnostics, $diagnostic if $diagnostic;
1408 }
1409
16384ac1
KW
1410 my $output = "POD of $filename";
1411 $output .= ", excluding $total_known not shown known potential problems"
1412 if $total_known;
1413 ok(@diagnostics == 0, $output);
1414 if (@diagnostics) {
1415 note(join "", @diagnostics,
1416 "See end of this test output for your options");
1417 }
1418 }
1419}
1420
1421my $how_to = <<EOF;
1422 run this test script by hand, using the following formula (on
1423 Un*x-like machines):
1424 cd t
1425 ./perl -I../lib porting/podcheck.t --regen
1426EOF
1427
1428if (%files_with_unknown_issues) {
1429 my $were_count_files = scalar keys %files_with_unknown_issues;
1430 $were_count_files = ($were_count_files == 1)
1431 ? "was $were_count_files file"
1432 : "were $were_count_files files";
1433 my $message = <<EOF;
1434
1435HOW TO GET THIS .t TO PASS
1436
1437There $were_count_files that had new potential problems identified. To get
1438this .t to pass, do the following:
1439
14401) If a problem is about a link to an unknown module that you know exists,
1441 simply edit the file,
1442 $known_issues
1443 and add anywhere a line that contains just the module's name.
1444 (Don't do this for a module that you aren't sure about; instead treat
1445 as another type of issue and follow the instructions below.)
1446
14472) For other issues, decide if each should be fixed now or not. Fix the
1448 ones you decided to, and rerun this test to verify that the fixes
1449 worked.
1450
14513) If there remain potential problems that you don't plan to fix right
1452 now (or aren't really problems),
1453$how_to
1454 That should cause all current potential problems to be accepted by the
1455 program, so that the next time it runs, they won't be flagged.
1456EOF
1457 if (%files_with_fixes) {
1458 $message .= " This step will also take care of the files that have fixes in them\n";
1459 }
1460
1461 $message .= <<EOF;
1462 For a few files, such as perltoc, certain issues will always be
1463 expected, and more of the same will be added over time. For those,
1464 before you do the regen, you can edit
1465 $known_issues
1466 and find the entry for the module's file and specific error message,
1467 and change the count of known potential problems to -1.
1468EOF
1469
1470 note($message);
1471} elsif (%files_with_fixes) {
1472 note(<<EOF
1473To teach this test script that the potential problems have been fixed,
1474$how_to
1475EOF
1476 );
1477}
1478
1479if ($regen) {
1480 chdir $original_dir || die "Can't change directories to $original_dir";
1481 close_and_rename($copy_fh);
1482}