This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
podcheck.t: decode verbatim paragraphs before checking length
[perl5.git] / t / porting / podcheck.t
CommitLineData
a67b1afa 1#!/usr/bin/perl -w
f7b649f0 2
a67b1afa 3use strict;
16384ac1
KW
4use warnings;
5use feature 'unicode_strings';
6
7use Carp;
8use Digest;
9use File::Find;
10use File::Spec;
11use Scalar::Util;
12use Text::Tabs;
13
14BEGIN {
15 require '../regen/regen_lib.pl';
16}
17
18sub DEBUG { 0 };
19
20=pod
21
22=head1 NAME
23
24podcheck.t - Look for possible problems in the Perl pods
25
26=head1 SYNOPSIS
27
28 cd t
29 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
30 [ FILE ...]
31 ./perl -I../lib porting/podcheck.t --regen
32
33=head1 DESCRIPTION
34
35podcheck.t is an extension of Pod::Checker. It looks for pod errors and
36potential errors in the files given as arguments, or if none specified, in all
37pods in the distribution workspace, except those in the cpan directory (unless
38C<--cpan> is specified). It does additional checking beyond that done by
39Pod::Checker, and keeps a database of known potential problems, and will
40fail a pod only if the number of such problems differs from that given in the
41database. It also suppresses the C<(section) deprecated> message from
42Pod::Checker, since specifying the man page section number is quite proper to do.
43
44The additional checks it makes are:
45
46=over
47
48=item Cross-pod link checking
49
50Pod::Checker verifies that links to an internal target in a pod are not
51broken. podcheck.t extends that (when called without FILE arguments) to
52external links. It does this by gathering up all the possible targets in the
53workspace, and cross-checking them. The database has a list of known targets
54outside the workspace, so podcheck.t will not raise a warning for
55using those. It also checks that a non-broken link points to just one target.
56(The destination pod could have two targets with the same name.)
57
58=item An internal link that isn't so specified
59
60If a link is broken, but there is an existing internal target of the same
61name, it is likely that the internal target was meant, and the C<"/"> is
62missing from the C<LE<lt>E<gt>> pod command.
63
64=item Verbatim paragraphs that wrap in an 80 column window
65
66It's annoying to have lines wrap when displaying pod documentation in a
67terminal window. This checks that all such lines fit, and for those that
68don't, it tells you how much needs to be cut in order to fit. However,
69if you're fixing these, keep in mind that some terminal/pager combinations
70require really a maximum of 79 or 78 columns to display properly.
71
72Often, the easiest thing to do to gain space for these is to lower the indent
73to just one space.
74
75=item Missing or duplicate NAME or missing NAME short description
76
77A pod can't be linked to unless it has a unique name.
78And a NAME should have a dash and short description after it.
79
80=item =encoding statement issues
81
82This indicates if an C<=encoding> statement should be present, or moved to the
83front of the pod.
84
85=item Items that perhaps should be links
86
87There are mentions of apparent files in the pods that perhaps should be links
88instead, using C<LE<lt>...E<gt>>
89
90=item Items that perhaps should be C<FE<lt>...E<gt>>
91
92What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
93C<FE<lt>...E<gt>> mark-up instead.
94
95=back
96
97A number of issues raised by podcheck.t and by the base Pod::Checker are not
98really problems, but merely potential problems. After inspecting them and
99deciding that they aren't real problems, it is possible to shut up this program
100about them, unlike base Pod::Checker. To do this, call podcheck.t with the
101C<--regen> option to regenerate the database. This tells it that all existing
102issues are to not be mentioned again.
103
104This isn't fool-proof. The database merely keeps track of the number of these
105potential problems of each type for each pod. If a new problem of a given
106type is introduced into the pod, podcheck.t will spit out all of them. You
107then have to figure out which is the new one, and should it be changed or not.
108But doing it this way insulates the database from having to keep track of line
109numbers of problems, which may change, or the exact wording of each problem
110which might also change without affecting whether it is a problem or not.
111
112Also, if the count of potential problems of a given type for a pod decreases,
113the database must be regenerated so that it knows the new number. The program
114gives instructions when this happens.
115
116There is currently no check that modules listed as valid in the data base
117actually are. Thus any errors introduced there will remain there.
118
119=head1 OPTIONS
120
121=over
122
123=item --regen
124
125Regenerate the data base used by podcheck.t to include all the existing
126potential problems. Future runs of the program will not then flag any of
127these.
128
129=item --cpan
130
131Normally, all pods in the cpan directory are skipped, except to make sure that
132any blead-upstream links to such pods are valid.
133This option will cause cpan upstream pods to be checked.
134
135=item --show_all
136
137Normally, if the number of potential problems of a given type found for a
138pod matches the expected value in the database, they will not be displayed.
139This option forces the database to be ignored during the run, so all potential
140problems are displayed and will fail their respective pod test. Specifying
141any particular FILES to operate on automatically selects this option.
142
143=item --counts
144
145Instead of testing, this just dumps the counts of the occurrences of the
146various types of potential problems in the data base.
147
148=back
149
150=head1 FILES
151
152The database is stored in F<t/porting/known_pod_issues.dat>
153
154=head1 SEE ALSO
155
156L<Pod::Checker>
157
158=cut
159
160#####################################################
161# HOW IT WORKS (in general)
162#
163# If not called with specific files to check, the directory structure is
164# examined for files that have pods in them. Files that might not have to be
165# fully parsed (e.g. in cpan) are parsed enough at this time to find their
166# pod's NAME, and to get a checksum.
167#
168# Those kinds of files are sorted last, but otherwise the pods are parsed with
169# the package coded here, My::Pod::Checker, which is an extension to
170# Pod::Checker that adds some tests and suppresses others that aren't
171# appropriate. The latter module has no provision for capturing diagnostics,
172# so a package, Tie_Array_to_FH, is used to force them to be placed into an
173# array instead of printed.
174#
175# Parsing the files builds up a list of links. The files are gone through
176# again, doing cross-link checking and outputting all saved-up problems with
177# each pod.
178#
179# Sorting the files last that potentially don't need to be fully parsed allows
180# us to not parse them unless there is a link to an internal anchor in them
181# from something that we have already parsed. Keeping checksums allows us to
182# not parse copies of other pods.
183#
184#####################################################
185
186# 1 => Exclude low priority messages that aren't likely to be problems, and
187# has many false positives; higher numbers give more messages.
188my $Warnings_Level = 200;
189
190# To see if two pods with the same NAME are actually copies of the same pod,
191# which is not an error, it uses a checksum to save work.
192my $digest_type = "SHA-1";
193
194my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
195my $data_dir = File::Spec->catdir($original_dir, 'porting');
196my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
197my $copy_fh;
198
199my $MAX_LINE_LENGTH = 80; # 80 columns
200my $INDENT = 8; # default nroff indent
201
202# Our warning messages. Better not have [('"] in them, as those are used as
203# delimiters for variable parts of the messages by poderror.
204my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
205my $broken_link = "Apparent broken link";
206my $broken_internal_link = "Apparent internal link is missing its forward slash";
207my $see_not_linked = "? Should you be using L<...> instead of";
208my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
209my $multiple_targets = "There is more than one target";
210my $duplicate_name = "Pod NAME already used";
211my $need_encoding = "Should have =encoding statement because have non-ASCII";
212my $encoding_first = "=encoding must be first command (if present)";
213my $no_name = "There is no NAME";
214my $missing_name_description = "The NAME should have a dash and short description after it";
215
7f8d58fb 216# objects, tests, etc can't be pods, so don't look for them. Also skip
a71e7b2c
KW
217# files output by the patch program. Could also ignore most of .gitignore
218# files, but not all, so don't.
219my $non_pods = qr/ (?: \.
220 (?: [achot] | zip | gz | bz2 | jar | tar | tgz | PL | so
221 | orig | rej | patch # Patch program output
222 | sw[op] | \#.* # Editor droppings
223 )
224 $
225 ) | ~$ # Another editor dropping
226 /x;
16384ac1
KW
227
228
229# Pod::Checker messages to suppress
230my @suppressed_messages = (
231 "(section) in", # Checker is wrong to flag this
232 "multiple occurrence of link target", # We catch independently the ones
233 # that are real problems.
234 "unescaped <>",
235);
236
237sub suppressed {
238 # Returns bool as to if input message is one that is to be suppressed
239
240 my $message = shift;
241 return grep { $message =~ /^\Q$_/i } @suppressed_messages;
242}
243
244{ # Closure to contain a simple subset of test.pl. This is to get rid of the
245 # unnecessary 'failed at' messages that would otherwise be output pointing
246 # to a particular line in this file.
a67b1afa 247
16384ac1
KW
248 my $current_test = 0;
249 my $planned;
250
251 sub plan {
252 my %plan = @_;
253 $planned = $plan{tests};
254 print "1..$planned\n";
255 return;
256 }
257
258 sub ok {
259 my $success = shift;
260 my $message = shift;
261
262 chomp $message;
263
264 $current_test++;
265 print "not " unless $success;
266 print "ok $current_test - $message\n";
267 return;
268 }
269
270 sub skip {
271 my $why = shift;
272 my $n = @_ ? shift : 1;
273 for (1..$n) {
274 $current_test++;
275 print "ok $current_test # skip $why\n";
276 }
277 no warnings 'exiting';
278 last SKIP;
279 }
280
281 sub note {
282 my $message = shift;
283
284 chomp $message;
285
286 print $message =~ s/^/# /mgr;
287 print "\n";
288 return;
289 }
290
291 END {
292 if ($planned && $planned != $current_test) {
293 print STDERR
294 "# Looks like you planned $planned tests but ran $current_test.\n";
295 }
296 }
297}
298
b3fdb838 299# This is to get this to work across multiple file systems, including those
bf8144e1
KW
300# that are not case sensitive. The db is stored in lower case, Un*x style,
301# and all file name comparisons are done that way.
b3fdb838 302sub canonicalize($) {
bf8144e1
KW
303 my $input = shift;
304 my ($volume, $directories, $file)
305 = File::Spec->splitpath(File::Spec->canonpath($input));
306 # Assumes $volume is constant for everything in this directory structure
307 $directories = "" if ! $directories;
308 $file = "" if ! $file;
309 my $output = lc join '/', File::Spec->splitdir($directories), $file;
310 $output =~ s! / /+ !/!gx; # Multiple slashes => single slash
311 return $output;
b3fdb838
KW
312}
313
16384ac1
KW
314
315# List of known potential problems by pod and type.
316my %known_problems;
317
318# Pods given by the keys contain an interior node that is referred to from
319# outside it.
320my %has_referred_to_node;
321
322my $show_counts = 0;
323my $regen = 0;
324my $show_all = 0;
325
326# Assume that are to skip anything in /cpan
327my $do_upstream_cpan = 0;
328
329while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
330 my $arg = shift @ARGV;
331
332 $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
333 if ($arg eq '-regen') {
334 $regen = 1;
335 }
336 elsif ($arg eq '-cpan') {
337 $do_upstream_cpan = 1;
338 }
339 elsif ($arg eq '-show_all') {
340 $show_all = 1;
341 }
342 elsif ($arg eq '-counts') {
343 $show_counts = 1;
344 }
345 else {
346 die <<EOF;
347Unknown option '$arg'
348
349Usage: $0 [ --regen | --cpan | --show_all ] [ FILE ... ]\n"
350 --cpan -> Include files in the cpan subdirectory.
351 --regen -> Regenerate the data file for $0
352 --show_all -> Show all known potential problems
353 --counts -> Don't test, but give summary counts of the currently
354 existing database
355EOF
356 }
357}
358
359my @files = @ARGV;
360
361if (($regen + $show_all + $show_counts + $do_upstream_cpan) > 1) {
362 croak "--regen, --show_all, --cpan, and --counts are mutually exclusive";
363}
364
365my $has_input_files = @files;
366
367if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
368 croak "--regen, --counts and --cpan can't be used since using specific files";
369}
370
371our %problems; # potential problems found in this run
372
373package My::Pod::Checker { # Extend Pod::Checker
a67b1afa
MM
374 use parent 'Pod::Checker';
375
16384ac1
KW
376 # Uses inside out hash to protect from typos
377 # For new fields, remember to add to destructor DESTROY()
378 my %indents; # Stack of indents from =over's in effect for
379 # current line
380 my %current_indent; # Current line's indent
381 my %filename; # The pod is store in this file
382 my %skip; # is SKIP set for this pod
383 my %in_NAME; # true if within NAME section
384 my %in_begin; # true if within =begin section
385 my %linkable_item; # Bool: if the latest =item is linkable. It isn't
386 # for bullet and number lists
387 my %linkable_nodes; # Pod::Checker adds all =items to its node list,
388 # but not all =items are linkable to
389 my %seen_encoding_cmd; # true if have =encoding earlier
390 my %command_count; # Number of commands seen
391 my %seen_pod_cmd; # true if have =pod earlier
392 my %warned_encoding; # true if already have warned about =encoding
393 # problems
394
395 sub DESTROY {
396 my $addr = Scalar::Util::refaddr $_[0];
397 delete $command_count{$addr};
398 delete $current_indent{$addr};
399 delete $filename{$addr};
400 delete $in_begin{$addr};
401 delete $indents{$addr};
402 delete $in_NAME{$addr};
403 delete $linkable_item{$addr};
404 delete $linkable_nodes{$addr};
405 delete $seen_encoding_cmd{$addr};
406 delete $seen_pod_cmd{$addr};
407 delete $skip{$addr};
408 delete $warned_encoding{$addr};
409 return;
410 }
411
412 sub new {
413 my $class = shift;
414 my $filename = shift;
415
416 my $self = $class->SUPER::new(-quiet => 1,
417 -warnings => $Warnings_Level);
418 my $addr = Scalar::Util::refaddr $self;
419 $command_count{$addr} = 0;
420 $current_indent{$addr} = 0;
421 $filename{$addr} = $filename;
422 $in_begin{$addr} = 0;
423 $in_NAME{$addr} = 0;
424 $linkable_item{$addr} = 0;
425 $seen_encoding_cmd{$addr} = 0;
426 $seen_pod_cmd{$addr} = 0;
427 $warned_encoding{$addr} = 0;
428 return $self;
429 }
430
431 # re's for messages that Pod::Checker outputs
432 my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
433 my $optional_location = qr/ (?: $location )? /xi;
3cb68c65 434 my $line_reference = qr/ [('"]? $optional_location \b line \s+
16384ac1
KW
435 (?: \d+ | EOF | \Q???\E | - )
436 [)'"]? /xi;
437
438 sub poderror { # Called to register a potential problem
439
440 # This adds an extra field to the parent hash, 'parameter'. It is
441 # used to extract the variable parts of a message leaving just the
442 # constant skeleton. This in turn allows the message to be
443 # categorized better, so that it shows up as a single type in our
444 # database, with the specifics of each occurrence not being stored with
445 # it.
446
447 my $self = shift;
448 my $opts = shift;
449
450 my $addr = Scalar::Util::refaddr $self;
451 return if $skip{$addr};
452
453 # Input can be a string or hash. If a string, parse it to separate
454 # out the line number and convert to a hash for easier further
455 # processing
456 my $message;
457 if (ref $opts ne 'HASH') {
458 $message = join "", $opts, @_;
459 my $line_number;
460 if ($message =~ s/\s*($line_reference)//) {
461 ($line_number = $1) =~ s/\s*$optional_location//;
462 }
463 else {
464 $line_number = '???';
465 }
466 $opts = { -msg => $message, -line => $line_number };
467 } else {
468 $message = $opts->{'-msg'};
469
470 }
471
472 $message =~ s/^\d+\s+//;
473 return if main::suppressed($message);
474
475 $self->SUPER::poderror($opts, @_);
476
477 $opts->{parameter} = "" unless $opts->{parameter};
478
479 # The variable parts of the message tend to be enclosed in '...',
480 # "....", or (...). Extract them and put them in an extra field,
481 # 'parameter'. This is trickier because the matching delimiter to a
482 # '(' is its mirror, and not itself. Text::Balanced could be used
483 # instead.
484 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
485 my $delimiter = $1;
486 my $start = $-[0];
487 $delimiter = ')' if $delimiter eq '(';
488
489 # If there is no ending delimiter, don't consider it to be a
490 # variable part. Most likely it is a contraction like "Don't"
491 last unless $message =~ m/\G .+? \Q$delimiter/xg;
492
493 my $length = $+[0] - $start;
494
495 # Get the part up through the closing delimiter
496 my $special = substr($message, $start, $length);
497 $special =~ s/^\s+//; # No leading whitespace
498
499 # And add that variable part to the parameter, while removing it
500 # from the message. This isn't a foolproof way of finding the
501 # variable part. For example '(s)' can occur in e.g.,
502 # 'paragraph(s)'
503 if ($special ne '(s)') {
504 substr($message, $start, $length) = "";
505 pos $message = $start;
506 $opts->{-msg} = $message;
507 $opts->{parameter} .= " " if $opts->{parameter};
508 $opts->{parameter} .= $special;
509 }
510 }
511
512 # Extract any additional line number given. This is often the
513 # beginning location of something whereas the main line number gives
514 # the ending one.
515 if ($message =~ /( $line_reference )/xi) {
516 my $line_ref = $1;
517 while ($message =~ s/\s*\Q$line_ref//) {
518 $opts->{-msg} = $message;
519 $opts->{parameter} .= " " if $opts->{parameter};
520 $opts->{parameter} .= $line_ref;
521 }
522 }
523
b3fdb838 524 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
16384ac1
KW
525 push @{$problems{$filename{$addr}}{$message}}, $opts;
526 #push @{$problems{$self->get_filename}{$message}}, $opts;
527 }
528
529 sub check_encoding { # Does it need an =encoding statement?
530 my ($self, $paragraph, $line_num, $pod_para) = @_;
531
532 # Do nothing if there is an =encoding in the file, or if the line
533 # doesn't require an =encoding, or have already warned.
534 my $addr = Scalar::Util::refaddr $self;
535 return if $seen_encoding_cmd{$addr}
536 || $warned_encoding{$addr}
537 || $paragraph !~ /\P{ASCII}/;
538
539 $warned_encoding{$addr} = 1;
540 my ($file, $line) = $pod_para->file_line;
541 $self->poderror({ -line => $line, -file => $file,
542 -msg => $need_encoding
543 });
544 return;
545 }
546
547 sub verbatim {
548 my ($self, $paragraph, $line_num, $pod_para) = @_;
549 $self->check_encoding($paragraph, $line_num, $pod_para);
550
551 $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
552
2592f3b8
DG
553 my $addr = Scalar::Util::refaddr $self;
554
16384ac1
KW
555 # Pick up the name, since the parent class doesn't in verbatim
556 # NAMEs; so treat as non-verbatim. The parent class only allows one
557 # paragraph in a NAME section, so if there is an extra blank line, it
558 # will trigger a message, but such a blank line is harmless, so skip
559 # in that case.
2592f3b8 560 if ($in_NAME{$addr} && $paragraph =~ /\S/) {
16384ac1
KW
561 $self->textblock($paragraph, $line_num, $pod_para);
562 }
563
564 my @lines = split /^/, $paragraph;
565 for my $i (0 .. @lines - 1) {
2592f3b8
DG
566 if ( my $encoding = $seen_encoding_cmd{$addr} ) {
567 require Encode;
568 $lines[$i] = Encode::decode($encoding, $lines[$i]);
569 }
16384ac1
KW
570 $lines[$i] =~ s/\s+$//;
571 my $indent = $self->get_current_indent;
572 my $exceeds = length(Text::Tabs::expand($lines[$i]))
573 + $indent - $MAX_LINE_LENGTH;
574 next unless $exceeds > 0;
575 my ($file, $line) = $pod_para->file_line;
576 $self->poderror({ -line => $line + $i, -file => $file,
577 -msg => $line_length,
578 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
579 });
580 }
581 }
582
583 sub textblock {
584 my ($self, $paragraph, $line_num, $pod_para) = @_;
585 $self->check_encoding($paragraph, $line_num, $pod_para);
586
587 $self->SUPER::textblock($paragraph, $line_num, $pod_para);
588
589 my ($file, $line) = $pod_para->file_line;
590 my $addr = Scalar::Util::refaddr $self;
591 if ($in_NAME{$addr}) {
592 if (! $self->name) {
593 my $text = $self->interpolate($paragraph, $line_num);
594 if ($text =~ /^\s*(\S+?)\s*$/) {
595 $self->name($1);
596 $self->poderror({ -line => $line, -file => $file,
597 -msg => $missing_name_description,
598 parameter => $1});
599 }
600 }
601 }
602 $paragraph = join " ", split /^/, $paragraph;
603
604 # Matches something that looks like a file name, but is enclosed in
605 # C<...>
606 my $C_path_re = qr{ \b ( C<
607 # exclude regexes and 'OS/2'
608 (?! (?: (?: s | qr | m) / ) | OS/2 > )
609 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
610 }x;
611
612 # If looks like a reference to other documentation by containing the
613 # word 'See' and then a likely pod directive, warn.
614
615 while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
616 ( [^<]*? ) # The not-< excludes nested C<L<...
617 > ) }ixg) {
618 my $construct = $1;
619 my $type = $2;
620 my $interior = $3;
621 if ($interior !~ /$non_pods/
622 && $construct !~ /$C_path_re/g) {
623 $self->poderror({ -line => $line, -file => $file,
624 -msg => $see_not_linked,
625 parameter => $construct
626 });
627 }
628 }
629 while ($paragraph =~ m/$C_path_re/g) {
630 my $construct = $1;
631 $self->poderror({ -line => $line, -file => $file,
632 -msg => $C_with_slash,
633 parameter => $construct
634 });
635 }
636 return;
637 }
638
639 sub command {
640 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
641 my $addr = Scalar::Util::refaddr $self;
642 if ($cmd eq "pod") {
643 $seen_pod_cmd{$addr}++;
644 }
645 elsif ($cmd eq "encoding") {
646 my ($file, $line) = $pod_para->file_line;
2592f3b8 647 $seen_encoding_cmd{$addr} = $paragraph; # for later decoding
16384ac1
KW
648 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
649 $self->poderror({ -line => $line, -file => $file,
650 -msg => $encoding_first
651 });
652 }
653 }
654 $self->check_encoding($paragraph, $line_num, $pod_para);
655
656 # Pod::Check treats all =items as linkable, but the bullet and
657 # numbered lists really aren't. So keep our own list. This has to be
658 # processed before SUPER is called so that the list is started before
659 # the rest of it gets parsed.
660 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
661 $linkable_item{$addr} = ($paragraph !~ / ^ \s*
662 (?: [*]
663 | \d+ \.? (?: \$ | \s+ )
664 )/x)
665 ? 1
666 : 0;
a67b1afa 667
16384ac1
KW
668 }
669 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
670
671 $command_count{$addr}++;
672
673 $in_NAME{$addr} = 0; # Will change to 1 below if necessary
674 $in_begin{$addr} = 0; # ibid
675 if ($cmd eq 'over') {
676 my $text = $self->interpolate($paragraph, $line_num);
677 my $indent = 4; # default
678 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
679 push @{$indents{$addr}}, $indent;
680 $current_indent{$addr} += $indent;
681 }
682 elsif ($cmd eq 'back') {
683 if (@{$indents{$addr}}) {
684 $current_indent{$addr} -= pop @{$indents{$addr}};
685 }
686 else {
687 # =back without corresponding =over, but should have
688 # warned already
689 $current_indent{$addr} = 0;
690 }
691 }
692 elsif ($cmd =~ /^head/) {
693 if (! $in_begin{$addr}) {
694
695 # If a particular formatter, then this command doesn't really
696 # apply
697 $current_indent{$addr} = 0;
698 undef @{$indents{$addr}};
699 }
700
701 my $text = $self->interpolate($paragraph, $line_num);
702 $in_NAME{$addr} = 1 if $cmd eq 'head1'
703 && $text && $text =~ /^NAME\b/;
704 }
705 elsif ($cmd eq 'begin') {
706 $in_begin{$addr} = 1;
707 }
708
709 return;
710 }
711
712 sub hyperlink {
713 my $self = shift;
714
715 # If the hyperlink is to an interior node of another page, save it
716 # so that we can see if we need to parse normally skipped files.
717 $has_referred_to_node{$_[0][1]{'-page'}} = 1
718 if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
719 return $self->SUPER::hyperlink($_[0]);
720 }
721
722 sub node {
723 my $self = shift;
724 my $text = $_[0];
725 if($text) {
726 $text =~ s/\s+$//s; # strip trailing whitespace
727 $text =~ s/\s+/ /gs; # collapse whitespace
728 my $addr = Scalar::Util::refaddr $self;
729 push(@{$linkable_nodes{$addr}}, $text) if
730 ! $current_indent{$addr}
731 || $linkable_item{$addr};
732 }
733 return $self->SUPER::node($_[0]);
734 }
735
736 sub get_current_indent {
737 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
738 }
739
740 sub get_filename {
741 return $filename{Scalar::Util::refaddr $_[0]};
742 }
743
744 sub linkable_nodes {
745 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
746 return undef unless $linkables;
747 return @$linkables;
748 }
749
750 sub get_skip {
751 return $skip{Scalar::Util::refaddr $_[0]} // 0;
752 }
753
754 sub set_skip {
755 my $self = shift;
756 $skip{Scalar::Util::refaddr $self} = shift;
757
758 # If skipping, no need to keep the problems for it
759 delete $problems{$self->get_filename};
760 return;
761 }
762}
763
764package Tie_Array_to_FH { # So printing actually goes to an array
765
766 my %array;
767
768 sub TIEHANDLE {
769 my $class = shift;
770 my $array_ref = shift;
771
772 my $self = bless \do{ my $anonymous_scalar }, $class;
773 $array{Scalar::Util::refaddr $self} = $array_ref;
774
775 return $self;
776 }
777
778 sub PRINT {
a67b1afa 779 my $self = shift;
16384ac1
KW
780 push @{$array{Scalar::Util::refaddr $self}}, @_;
781 return 1;
782 }
a67b1afa
MM
783}
784
69f6a9a1 785
16384ac1
KW
786my %filename_to_checker; # Map a filename to it's pod checker object
787my %id_to_checker; # Map a checksum to it's pod checker object
788my %nodes; # key is filename, values are nodes in that file.
789my %nodes_first_word; # same, but value is first word of each node
790my %valid_modules; # List of modules known to exist outside us.
791my %digests; # checksums of files, whose names are the keys
792my %filename_to_pod; # Map a filename to its pod NAME
793my %files_with_unknown_issues;
794my %files_with_fixes;
69f6a9a1 795
16384ac1 796my $data_fh;
be39e7f8 797open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
69f6a9a1 798
16384ac1
KW
799my %counts; # For --counts param, count of each issue type
800my %suppressed_files; # Files with at least one issue type to suppress
801
802while (<$data_fh>) { # Read the data base
69f6a9a1 803 chomp;
16384ac1
KW
804 next if /^\s*(?:#|$)/; # Skip comment and empty lines
805 if (/\t/) {
806 next if $show_all;
807
808 # Keep track of counts of each issue type for each file
809 my ($filename, $message, $count) = split /\t/;
810 $known_problems{$filename}{$message} = $count;
811
812 if ($show_counts) {
813 if ($count < 0) { # -1 means to suppress this issue type
814 $suppressed_files{$filename} = $filename;
815 }
816 else {
817 $counts{$message} += $count;
818 }
819 }
820 }
821 else { # Lines without a tab are modules known to be valid
822 $valid_modules{$_} = 1
823 }
824}
825close $data_fh;
826
827if ($show_counts) {
828 my $total = 0;
829 foreach my $message (sort keys %counts) {
830 $total += $counts{$message};
831 note(Text::Tabs::expand("$counts{$message}\t$message"));
832 }
833 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
834 if (%suppressed_files) {
835 note("\nFiles that have all messages of at least one type suppressed:");
836 note(join ",", keys %suppressed_files);
837 }
838 exit 0;
839}
840
841
842my %excluded_files = (
843 "lib/unicore/mktables" => 1,
844 "Porting/perldelta_template.pod" => 1,
845 "autodoc.pl" => 1,
846 "configpm" => 1,
847 "miniperl" => 1,
848 "perl" => 1,
e678c294
KW
849
850 # It would be nice if we didn't have to skip this,
851 # but the errors in it are too variable.
852 "pod/perltoc.pod" => 1,
16384ac1
KW
853 );
854
b3fdb838
KW
855# Convert to more generic form.
856foreach my $file (keys %excluded_files) {
857 $excluded_files{canonicalize($excluded_files{$file})}
858 = $excluded_files{$file};
859}
860
16384ac1
KW
861# re to match files that are to be parsed only if there is an internal link
862# to them. It does not include cpan, as whether those are parsed depends
863# on a switch. Currently, only the stable perldelta.pod's are included.
864# These all have characters between 'perl' and 'delta'. (Actually the
865# currently developed one matches as well, but is a duplicate of
866# perldelta.pod, so can be skipped, so fine for it to match this.
867my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
868
869{ # Closure
870 my $first_time = 1;
871
872 sub output_thanks ($$$$) { # Called when an issue has been fixed
873 my $filename = shift;
874 my $original_count = shift;
875 my $current_count = shift;
876 my $message = shift;
877
878 $files_with_fixes{$filename} = 1;
879 my $return;
880 my $fixed_count = $original_count - $current_count;
881 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
882 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
883 my $diff;
884 if ($message) {
885 $diff = <<EOF;
886There were $original_count occurrences (now $current_count) in this pod of type
887"$message",
888EOF
889 } else {
890 $diff = <<EOF;
891There are no longer any problems found in this pod!
892EOF
893 }
894
895 if ($first_time) {
896 $first_time = 0;
897 $return = <<EOF;
898Thanks for fixing $a_problem!
899$diff
900Now you must teach $0 that this was fixed.
901EOF
902 }
903 else {
904 $return = <<EOF
905Thanks for fixing $another_problem.
906$diff
907EOF
908 }
909
910 return $return;
911 }
912}
913
914sub my_safer_print { # print, with error checking for outputting to db
915 my ($fh, @lines) = @_;
916
917 if (! print $fh @lines) {
918 my $save_error = $!;
919 close($fh);
920 die "Write failure: $save_error";
921 }
922}
923
924sub extract_pod { # Extracts just the pod from a file
925 my $filename = shift;
926
927 my @pod;
928
929 # Arrange for the output of Pod::Parser to be collected in an array we can
930 # look at instead of being printed
931 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
be39e7f8 932 open my $in_fh, '<:bytes', $filename
763df156
KW
933
934 # The file should already have been opened once to get here, so if
77b8b9ad
KW
935 # fails, just die. It's possible that a transitory file containing a
936 # pod would get here, but not bothering to add code for that very
937 # unlikely event.
16384ac1
KW
938 or die "Can't open '$filename': $!\n";
939
940 my $parser = Pod::Parser->new();
941 $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
942 close $in_fh;
943
944 return join "", @pod
945}
946
947my $digest = Digest->new($digest_type);
948
949sub is_pod_file {
c1dcaaab
KW
950 # If $_ is a pod file, add it to the lists and do other prep work.
951
16384ac1
KW
952 if (-d $_) {
953 # Don't look at files in directories that are for tests, nor those
954 # beginning with a dot
955 if ($_ eq 't' || $_ =~ /^\../) {
956 $File::Find::prune = 1;
957 }
958 return;
959 }
960
961 return if $_ =~ /^\./; # No hidden Unix files
962 return if $_ =~ $non_pods;
963
964 my $filename = $File::Find::name;
965
966 # Assumes that the path separator is exactly one character.
967 $filename =~ s/^\..//;
968
b3fdb838 969 return if $excluded_files{canonicalize($filename)};
16384ac1 970
144a708b
NC
971 my $contents = do {
972 local $/;
763df156
KW
973 my $candidate;
974 if (! open $candidate, '<:bytes', $_) {
975
77b8b9ad
KW
976 # If a transitory file was found earlier, the open could fail
977 # legitimately and we just skip the file; also skip it if it is a
978 # broken symbolic link, as it is probably just a build problem;
979 # certainly not a file that we would want to check the pod of.
980 # Otherwise fail it here and no reason to process it further.
981 # (But the test count will be off too)
982 ok(0, "Can't open '$filename': $!")
983 if -e $filename && ! -l $filename;
763df156
KW
984 return;
985 }
144a708b
NC
986 <$candidate>;
987 };
16384ac1
KW
988
989 # If the file is a .pm or .pod, having any initial '=' on a line is
990 # grounds for testing it. Otherwise, require a head1 NAME line to view it
991 # as a potential pod
144a708b
NC
992 if ($filename =~ /\.(?:pm|pod)/) {
993 return unless $contents =~ /^=/m;
994 } else {
995 return unless $contents =~ /^=head1 +NAME/m;
16384ac1 996 }
144a708b
NC
997
998 # Here, we know that the file is a pod. Add it to the list of files
999 # to check and create a checker object for it.
1000
1001 push @files, $filename;
1002 my $checker = My::Pod::Checker->new($filename);
1003 $filename_to_checker{$filename} = $checker;
1004
1005 # In order to detect duplicate pods and only analyze them once, we
1006 # compute checksums for the file, so don't have to do an exact
1007 # compare. Note that if the pod is just part of the file, the
1008 # checksums can differ for the same pod. That special case is handled
1009 # later, since if the checksums of the whole file are the same, that
1010 # case won't even come up. We don't need the checksums for files that
1011 # we parse only if there is a link to its interior, but we do need its
1012 # NAME, which is also retrieved in the code below.
1013
1014 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
1015 | $only_for_interior_links_re
1016 /x) {
1017 $digest->add($contents);
1018 $digests{$filename} = $digest->digest;
1019
1020 # lib files aren't analyzed if they are duplicates of files copied
1021 # there from some other directory. But to determine this, we need
1022 # to know their NAMEs. We might as well find the NAME now while
1023 # the file is open. Similarly, cpan files aren't analyzed unless
1024 # we're analyzing all of them, or this particular file is linked
1025 # to by a file we are analyzing, and thus we will want to verify
1026 # that the target exists in it. We need to know at least the NAME
1027 # to see if it's worth analyzing, or so we can determine if a lib
1028 # file is a copy of a cpan one.
1029 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
16384ac1 1030 | $only_for_interior_links_re
144a708b
NC
1031 }x) {
1032 if ($contents =~ /^=head1 +NAME.*/mg) {
1033 # The NAME is the first non-spaces on the line up to a
1034 # comma, dash or end of line. Otherwise, it's invalid and
1035 # this pod doesn't have a legal name that we're smart
1036 # enough to find currently. But the parser will later
1037 # find it if it thinks there is a legal name, and set the
1038 # name
1039 if ($contents =~ /\G # continue from the line after =head1
1040 \s* # ignore any empty lines
1041 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1042 my $name = $1;
1043 $checker->name($name);
1044 $id_to_checker{$name} = $checker
1045 if $filename =~ m{^cpan/};
16384ac1
KW
1046 }
1047 }
144a708b
NC
1048 elsif ($filename =~ m{^cpan/}) {
1049 $id_to_checker{$digests{$filename}} = $checker;
1050 }
16384ac1
KW
1051 }
1052 }
c1dcaaab
KW
1053
1054 return;
16384ac1
KW
1055} # End of is_pod_file()
1056
1057# Start of real code that isn't processing the command line.
1058# Here, @files contains list of files on the command line. If have any of
1059# these, unconditionally test them, and show all the errors, even the known
1060# ones, and, since not testing other pods, don't do cross-pod link tests.
1061# (Could add extra code to do cross-pod tests for the ones in the list.)
1062if ($has_input_files) {
1063 undef %known_problems;
1064 $do_upstream_cpan = 1; # In case one of the inputs is from cpan
1065
1066}
1067else { # No input files -- go find all the possibilities.
1068 if ($regen) {
1069 $copy_fh = open_new($known_issues);
1070 note("Regenerating $known_issues, please be patient...");
1071 print $copy_fh <<END;
1072# This file is the data file for $0.
1073# There are three types of lines.
1074# Comment lines are white-space only or begin with a '#', like this one. Any
1075# changes you make to the comment lines will be lost when the file is
1076# regen'd.
1077# Lines without tab characters are simply NAMES of pods that the program knows
1078# will have links to them and the program does not check if those links are
1079# valid.
1080# All other lines should have three fields, each separated by a tab. The
1081# first field is the name of a pod; the second field is an error message
1082# generated by this program; and the third field is a count of how many
1083# known instances of that message there are in the pod. -1 means that the
1084# program can expect any number of this type of message.
1085END
1086 }
1087
1088 # Move to the directory above us, but have to adjust @INC to account for
1089 # that.
1090 s{^\.\./lib$}{lib} for @INC;
1091 chdir File::Spec->updir;
1092
1093 # And look in this directory and all its subdirectories
1094 find( \&is_pod_file, '.');
1095
1096 # Add ourselves to the test
b3fdb838 1097 push @files, "t/porting/podcheck.t";
16384ac1
KW
1098}
1099
1100# Now we know how many tests there will be.
1101plan (tests => scalar @files) if ! $regen;
1102
1103
1104 # Sort file names so we get consistent results, and to put cpan last,
1105 # preceeded by the ones that we don't generally parse. This is because both
1106 # these classes are generally parsed only if there is a link to the interior
1107 # of them, and we have to parse all others first to guarantee that they don't
1108 # have such a link. 'lib' files come just before these, as some of these are
1109 # duplicates of others. We already have figured this out when gathering the
1110 # data as a special case for all such files, but this, while unnecessary,
1111 # puts the derived file last in the output. 'readme' files come before those,
1112 # as those also could be duplicates of others, which are considered the
1113 # primary ones. These currently aren't figured out when gathering data, so
1114 # are done here.
1115 @files = sort { if ($a =~ /^cpan/) {
1116 return 1 if $b !~ /^cpan/;
1117 return $a cmp $b;
1118 }
1119 elsif ($b =~ /^cpan/) {
1120 return -1;
1121 }
1122 elsif ($a =~ /$only_for_interior_links_re/) {
1123 return 1 if $b !~ /$only_for_interior_links_re/;
1124 return $a cmp $b;
1125 }
1126 elsif ($b =~ /$only_for_interior_links_re/) {
1127 return -1;
1128 }
1129 elsif ($a =~ /^lib/) {
1130 return 1 if $b !~ /^lib/;
1131 return $a cmp $b;
1132 }
1133 elsif ($b =~ /^lib/) {
1134 return -1;
1135 } elsif ($a =~ /\breadme\b/i) {
1136 return 1 if $b !~ /\breadme\b/i;
1137 return $a cmp $b;
1138 }
1139 elsif ($b =~ /\breadme\b/i) {
1140 return -1;
1141 }
1142 else {
1143 return lc $a cmp lc $b;
1144 }
1145 }
1146 @files;
1147
1148# Now go through all the files and parse them
1149foreach my $filename (@files) {
1150 my $parsed = 0;
1151 note("parsing $filename") if DEBUG;
1152
1153 # We may have already figured out some things in the process of generating
1154 # the file list. If so, have a $checker object already. But if not,
1155 # generate one now.
1156 my $checker = $filename_to_checker{$filename};
1157 if (! $checker) {
1158 $checker = My::Pod::Checker->new($filename);
1159 $filename_to_checker{$filename} = $checker;
1160 }
1161
1162 # We have set the name in the checker object if there is a possibility
1163 # that no further parsing is necessary, but otherwise do the parsing now.
1164 if (! $checker->name) {
1165 $parsed = 1;
1166 $checker->parse_from_file($filename, undef);
1167 }
1168
1169 if ($checker->num_errors() < 0) { # Returns negative if not a pod
1170 $checker->set_skip("$filename is not a pod");
1171 }
1172 else {
1173
1174 # Here, is a pod. See if it is one that has already been tested,
1175 # or should be tested under another directory. Use either its NAME
1176 # if it has one, or a checksum if not.
1177 my $name = $checker->name;
1178 my $id;
1179
1180 if ($name) {
1181 $id = $name;
1182 }
1183 else {
1184 my $digest = Digest->new($digest_type);
1185 $digest->add(extract_pod($filename));
1186 $id = $digest->digest;
1187 }
1188
1189 # If there is a match for this pod with something that we've already
1190 # processed, don't process it, and output why.
1191 my $prior_checker;
1192 if (defined ($prior_checker = $id_to_checker{$id})
1193 && $prior_checker != $checker) # Could have defined the checker
1194 # earlier without pursuing it
1195 {
1196
1197 # If the pods are identical, then it's just a copy, and isn't an
1198 # error. First use the checksums we have already computed to see
1199 # if the entire files are identical, which means that the pods are
1200 # identical too.
1201 my $prior_filename = $prior_checker->get_filename;
1202 my $same = (! $name
1203 || ($digests{$prior_filename}
1204 && $digests{$filename}
1205 && $digests{$prior_filename} eq $digests{$filename}));
1206
1207 # If they differ, it could be that the files differ for some
1208 # reason, but the pods they contain are identical. Extract the
1209 # pods and do the comparisons on just those.
1210 if (! $same && $name) {
1211 $same = extract_pod($prior_filename) eq extract_pod($filename);
1212 }
1213
1214 if ($same) {
1215 $checker->set_skip("The pod of $filename is a duplicate of "
1216 . "the pod for $prior_filename");
1217 } elsif ($prior_filename =~ /\breadme\b/i) {
1218 $checker->set_skip("$prior_filename is a README apparently for $filename");
1219 } elsif ($filename =~ /\breadme\b/i) {
1220 $checker->set_skip("$filename is a README apparently for $prior_filename");
1221 } else { # Here have two pods with identical names that differ
1222 $prior_checker->poderror(
1223 { -msg => $duplicate_name,
1224 -line => "???",
1225 parameter => "'$filename' also has NAME '$name'"
1226 });
1227 $checker->poderror(
1228 { -msg => $duplicate_name,
1229 -line => "???",
1230 parameter => "'$prior_filename' also has NAME '$name'"
1231 });
1232
1233 # Changing the names helps later.
1234 $prior_checker->name("$name version arbitrarily numbered 1");
1235 $checker->name("$name version arbitrarily numbered 2");
1236 }
1237
1238 # In any event, don't process this pod that has the same name as
1239 # another.
1240 next;
1241 }
1242
1243 # A unique pod.
1244 $id_to_checker{$id} = $checker;
1245
1246 my $parsed_for_links = ", but parsed for its interior links";
1247 if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1248 || $filename =~ $only_for_interior_links_re)
1249 {
1250 if ($filename =~ /^cpan/) {
1251 $checker->set_skip("CPAN is upstream for $filename");
1252 }
1253 elsif ($filename =~ /perl\d+delta/) {
1254 $checker->set_skip("$filename is a stable perldelta");
1255 }
1256 else {
1257 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1258 }
1259
1260 if ($name && $has_referred_to_node{$name}) {
1261 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1262 }
1263 }
1264
1265 # Need a name in order to process it, because not meaningful
1266 # otherwise, and also can't test links to this without a name.
1267 if (!defined $name) {
1268 $checker->poderror( { -msg => $no_name,
1269 -line => '???'
1270 });
1271 next;
1272 }
1273
1274 # For skipped files, just get its NAME
1275 my $skip;
1276 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1277 {
1278 $checker->node($name) if $name;
1279 }
1280 else {
1281 $checker->parse_from_file($filename, undef) if ! $parsed;
1282 }
1283
1284 # Go through everything in the file that could be an anchor that
1285 # could be a link target. Count how many there are of the same name.
1286 foreach my $node ($checker->linkable_nodes) {
1287 next if ! $node; # Can be empty is like '=item *'
1288 if (exists $nodes{$name}{$node}) {
1289 $nodes{$name}{$node}++;
1290 }
1291 else {
1292 $nodes{$name}{$node} = 1;
1293 }
1294
1295 # Experiments have shown that cpan search can figure out the
1296 # target of a link even if the exact wording is incorrect, as long
1297 # as the first word is. This happens frequently in perlfunc.pod,
1298 # where the link will be just to the function, but the target
1299 # entry also includes parameters to the function.
1300 my $first_word = $node;
1301 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1302 $nodes_first_word{$name}{$first_word} = $node;
1303 }
1304 }
1305 $filename_to_pod{$filename} = $name;
1306 }
1307}
1308
1309# Here, all files have been parsed, and all links and link targets are stored.
1310# Now go through the files again and see which don't have matches.
1311if (! $has_input_files) {
1312 foreach my $filename (@files) {
1313 next if $filename_to_checker{$filename}->get_skip;
1314 my $checker = $filename_to_checker{$filename};
1315 foreach my $link ($checker->hyperlink) {
1316 my $linked_to_page = $link->[1]->page;
1317 next unless $linked_to_page; # intra-file checks are handled by std
1318 # Pod::Checker
1319
1320 # Initialize the potential message.
1321 my %problem = ( -msg => $broken_link,
1322 -line => $link->[0],
1323 parameter => "to \"$linked_to_page\"",
1324 );
1325
1326 # See if we have found the linked-to_file in our parse
1327 if (exists $nodes{$linked_to_page}) {
1328 my $node = $link->[1]->node;
1329
1330 # If link is only to the page-level, already have it
1331 next if ! $node;
1332
1333 # Transform pod language to what we are expecting
1334 $node =~ s,E<sol>,/,g;
1335 $node =~ s/E<verbar>/|/g;
1336
1337 # If link is to a node that exists in the file, is ok
1338 if ($nodes{$linked_to_page}{$node}) {
1339
1340 # But if the page has multiple targets with the same name,
1341 # it's ambiguous which one this should be to.
1342 if ($nodes{$linked_to_page}{$node} > 1) {
1343 $problem{-msg} = $multiple_targets;
1344 $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1345 $checker->poderror(\%problem);
1346 }
1347 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1348
1349 # Here the link target was not found, either exactly or to
1350 # the first word. Is an error.
1351 $problem{parameter} =~ s,"$,/$node",;
1352 $checker->poderror(\%problem);
1353 }
1354
1355 } # Linked-to-file not in parse; maybe is in exception list
1356 elsif (! exists $valid_modules{$link->[1]->page}) {
1357
1358 # Here, is a link to a target that we can't find. Check if
1359 # there is an internal link on the page with the target name.
1360 # If so, it could be that they just forgot the initial '/'
1361 if ($filename_to_pod{$filename}
1362 && $nodes{$filename_to_pod{$filename}}{$linked_to_page})
1363 {
1364 $problem{-msg} = $broken_internal_link;
1365 }
1366 $checker->poderror(\%problem);
1367 }
1368 }
1369 }
1370}
1371
1372# If regenerating the data file, start with the modules for which we don't
1373# check targets
1374if ($regen) {
1375 foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1376 my_safer_print($copy_fh, $_, "\n");
1377 }
1378}
1379
1380# Now ready to output the messages.
1381foreach my $filename (@files) {
1382 my $test_name = "POD of $filename";
b3fdb838 1383 my $canonical = canonicalize($filename);
16384ac1
KW
1384 SKIP: {
1385 my $skip = $filename_to_checker{$filename}->get_skip // "";
1386
1387 if ($regen) {
1388 foreach my $message ( sort keys %{$problems{$filename}}) {
1389 my $count;
1390
1391 # Preserve a negative setting.
b3fdb838
KW
1392 if ($known_problems{$canonical}{$message}
1393 && $known_problems{$canonical}{$message} < 0)
16384ac1 1394 {
b3fdb838 1395 $count = $known_problems{$canonical}{$message};
16384ac1
KW
1396 }
1397 else {
1398 $count = @{$problems{$filename}{$message}};
1399 }
b3fdb838 1400 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
16384ac1
KW
1401 }
1402 next;
1403 }
1404
1405 skip($skip, 1) if $skip;
1406 my @diagnostics;
1407 my $indent = ' ';
1408
16384ac1
KW
1409 my $total_known = 0;
1410 foreach my $message ( sort keys %{$problems{$filename}}) {
b3fdb838
KW
1411 $known_problems{$canonical}{$message} = 0
1412 if ! $known_problems{$canonical}{$message};
16384ac1
KW
1413 my $diagnostic = "";
1414 my $problem_count = scalar @{$problems{$filename}{$message}};
1415 $total_known += $problem_count;
b3fdb838
KW
1416 next if $known_problems{$canonical}{$message} < 0;
1417 if ($problem_count > $known_problems{$canonical}{$message}) {
16384ac1
KW
1418
1419 # Here we are about to output all the messages for this type,
1420 # subtract back this number we previously added in.
1421 $total_known -= $problem_count;
1422
1423 $diagnostic .= $indent . $message;
1424 if ($problem_count > 2) {
1425 $diagnostic .= " ($problem_count occurrences)";
1426 }
1427 foreach my $problem (@{$problems{$filename}{$message}}) {
1428 $diagnostic .= " " if $problem_count == 1;
1429 $diagnostic .= "\n$indent$indent";
1430 $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1431 $diagnostic .= " near line $problem->{-line}";
1432 $diagnostic .= " $problem->{comment}" if $problem->{comment};
1433 }
1434 $diagnostic .= "\n";
1435 $files_with_unknown_issues{$filename} = 1;
b3fdb838
KW
1436 } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1437 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
16384ac1
KW
1438 }
1439 push @diagnostics, $diagnostic if $diagnostic;
1440 }
1441
09ea063a
KW
1442 # The above loop has output messages where there are current potential
1443 # issues. But it misses where there were some that have been entirely
1444 # fixed. For those, we need to look through the old issues
b3fdb838 1445 foreach my $message ( sort keys %{$known_problems{$canonical}}) {
09ea063a 1446 next if $problems{$filename}{$message};
b3fdb838
KW
1447 next if ! $known_problems{$canonical}{$message};
1448 next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1449 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
09ea063a
KW
1450 push @diagnostics, $diagnostic if $diagnostic;
1451 }
1452
16384ac1
KW
1453 my $output = "POD of $filename";
1454 $output .= ", excluding $total_known not shown known potential problems"
1455 if $total_known;
1456 ok(@diagnostics == 0, $output);
1457 if (@diagnostics) {
1458 note(join "", @diagnostics,
1459 "See end of this test output for your options");
1460 }
1461 }
1462}
1463
1464my $how_to = <<EOF;
1465 run this test script by hand, using the following formula (on
1466 Un*x-like machines):
1467 cd t
1468 ./perl -I../lib porting/podcheck.t --regen
1469EOF
1470
1471if (%files_with_unknown_issues) {
1472 my $were_count_files = scalar keys %files_with_unknown_issues;
1473 $were_count_files = ($were_count_files == 1)
1474 ? "was $were_count_files file"
1475 : "were $were_count_files files";
1476 my $message = <<EOF;
1477
1478HOW TO GET THIS .t TO PASS
1479
1480There $were_count_files that had new potential problems identified. To get
1481this .t to pass, do the following:
1482
14831) If a problem is about a link to an unknown module that you know exists,
1484 simply edit the file,
1485 $known_issues
1486 and add anywhere a line that contains just the module's name.
1487 (Don't do this for a module that you aren't sure about; instead treat
1488 as another type of issue and follow the instructions below.)
1489
14902) For other issues, decide if each should be fixed now or not. Fix the
1491 ones you decided to, and rerun this test to verify that the fixes
1492 worked.
1493
14943) If there remain potential problems that you don't plan to fix right
1495 now (or aren't really problems),
1496$how_to
1497 That should cause all current potential problems to be accepted by the
1498 program, so that the next time it runs, they won't be flagged.
1499EOF
1500 if (%files_with_fixes) {
1501 $message .= " This step will also take care of the files that have fixes in them\n";
1502 }
1503
1504 $message .= <<EOF;
1505 For a few files, such as perltoc, certain issues will always be
1506 expected, and more of the same will be added over time. For those,
1507 before you do the regen, you can edit
1508 $known_issues
1509 and find the entry for the module's file and specific error message,
1510 and change the count of known potential problems to -1.
1511EOF
1512
1513 note($message);
1514} elsif (%files_with_fixes) {
1515 note(<<EOF
1516To teach this test script that the potential problems have been fixed,
1517$how_to
1518EOF
1519 );
1520}
1521
1522if ($regen) {
1523 chdir $original_dir || die "Can't change directories to $original_dir";
1524 close_and_rename($copy_fh);
1525}