This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
podcheck.t: Special case perldelta
[perl5.git] / t / porting / podcheck.t
CommitLineData
a67b1afa 1#!/usr/bin/perl -w
f7b649f0 2
a67b1afa 3use strict;
16384ac1
KW
4use warnings;
5use feature 'unicode_strings';
6
7use Carp;
8use Digest;
9use File::Find;
10use File::Spec;
11use Scalar::Util;
12use Text::Tabs;
13
14BEGIN {
15 require '../regen/regen_lib.pl';
16}
17
18sub DEBUG { 0 };
19
20=pod
21
22=head1 NAME
23
24podcheck.t - Look for possible problems in the Perl pods
25
26=head1 SYNOPSIS
27
28 cd t
29 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
30 [ FILE ...]
31 ./perl -I../lib porting/podcheck.t --regen
32
33=head1 DESCRIPTION
34
35podcheck.t is an extension of Pod::Checker. It looks for pod errors and
36potential errors in the files given as arguments, or if none specified, in all
37pods in the distribution workspace, except those in the cpan directory (unless
38C<--cpan> is specified). It does additional checking beyond that done by
39Pod::Checker, and keeps a database of known potential problems, and will
40fail a pod only if the number of such problems differs from that given in the
41database. It also suppresses the C<(section) deprecated> message from
42Pod::Checker, since specifying the man page section number is quite proper to do.
43
44The additional checks it makes are:
45
46=over
47
48=item Cross-pod link checking
49
50Pod::Checker verifies that links to an internal target in a pod are not
51broken. podcheck.t extends that (when called without FILE arguments) to
52external links. It does this by gathering up all the possible targets in the
53workspace, and cross-checking them. The database has a list of known targets
54outside the workspace, so podcheck.t will not raise a warning for
55using those. It also checks that a non-broken link points to just one target.
56(The destination pod could have two targets with the same name.)
57
58=item An internal link that isn't so specified
59
60If a link is broken, but there is an existing internal target of the same
61name, it is likely that the internal target was meant, and the C<"/"> is
62missing from the C<LE<lt>E<gt>> pod command.
63
64=item Verbatim paragraphs that wrap in an 80 column window
65
66It's annoying to have lines wrap when displaying pod documentation in a
67terminal window. This checks that all such lines fit, and for those that
68don't, it tells you how much needs to be cut in order to fit. However,
69if you're fixing these, keep in mind that some terminal/pager combinations
70require really a maximum of 79 or 78 columns to display properly.
71
72Often, the easiest thing to do to gain space for these is to lower the indent
73to just one space.
74
75=item Missing or duplicate NAME or missing NAME short description
76
77A pod can't be linked to unless it has a unique name.
78And a NAME should have a dash and short description after it.
79
80=item =encoding statement issues
81
82This indicates if an C<=encoding> statement should be present, or moved to the
83front of the pod.
84
85=item Items that perhaps should be links
86
87There are mentions of apparent files in the pods that perhaps should be links
88instead, using C<LE<lt>...E<gt>>
89
90=item Items that perhaps should be C<FE<lt>...E<gt>>
91
92What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
93C<FE<lt>...E<gt>> mark-up instead.
94
95=back
96
97A number of issues raised by podcheck.t and by the base Pod::Checker are not
98really problems, but merely potential problems. After inspecting them and
99deciding that they aren't real problems, it is possible to shut up this program
100about them, unlike base Pod::Checker. To do this, call podcheck.t with the
101C<--regen> option to regenerate the database. This tells it that all existing
102issues are to not be mentioned again.
103
104This isn't fool-proof. The database merely keeps track of the number of these
105potential problems of each type for each pod. If a new problem of a given
106type is introduced into the pod, podcheck.t will spit out all of them. You
107then have to figure out which is the new one, and should it be changed or not.
108But doing it this way insulates the database from having to keep track of line
109numbers of problems, which may change, or the exact wording of each problem
110which might also change without affecting whether it is a problem or not.
111
112Also, if the count of potential problems of a given type for a pod decreases,
113the database must be regenerated so that it knows the new number. The program
114gives instructions when this happens.
115
116There is currently no check that modules listed as valid in the data base
117actually are. Thus any errors introduced there will remain there.
118
119=head1 OPTIONS
120
121=over
122
123=item --regen
124
125Regenerate the data base used by podcheck.t to include all the existing
126potential problems. Future runs of the program will not then flag any of
127these.
128
129=item --cpan
130
131Normally, all pods in the cpan directory are skipped, except to make sure that
132any blead-upstream links to such pods are valid.
133This option will cause cpan upstream pods to be checked.
134
135=item --show_all
136
137Normally, if the number of potential problems of a given type found for a
138pod matches the expected value in the database, they will not be displayed.
139This option forces the database to be ignored during the run, so all potential
140problems are displayed and will fail their respective pod test. Specifying
141any particular FILES to operate on automatically selects this option.
142
143=item --counts
144
145Instead of testing, this just dumps the counts of the occurrences of the
146various types of potential problems in the data base.
147
148=back
149
150=head1 FILES
151
152The database is stored in F<t/porting/known_pod_issues.dat>
153
154=head1 SEE ALSO
155
156L<Pod::Checker>
157
158=cut
159
160#####################################################
161# HOW IT WORKS (in general)
162#
163# If not called with specific files to check, the directory structure is
164# examined for files that have pods in them. Files that might not have to be
165# fully parsed (e.g. in cpan) are parsed enough at this time to find their
166# pod's NAME, and to get a checksum.
167#
168# Those kinds of files are sorted last, but otherwise the pods are parsed with
169# the package coded here, My::Pod::Checker, which is an extension to
170# Pod::Checker that adds some tests and suppresses others that aren't
171# appropriate. The latter module has no provision for capturing diagnostics,
172# so a package, Tie_Array_to_FH, is used to force them to be placed into an
173# array instead of printed.
174#
175# Parsing the files builds up a list of links. The files are gone through
176# again, doing cross-link checking and outputting all saved-up problems with
177# each pod.
178#
179# Sorting the files last that potentially don't need to be fully parsed allows
180# us to not parse them unless there is a link to an internal anchor in them
181# from something that we have already parsed. Keeping checksums allows us to
182# not parse copies of other pods.
183#
184#####################################################
185
186# 1 => Exclude low priority messages that aren't likely to be problems, and
187# has many false positives; higher numbers give more messages.
188my $Warnings_Level = 200;
189
1c01047d
KW
190# perldelta during construction may have place holder links.
191our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta" );
192
16384ac1
KW
193# To see if two pods with the same NAME are actually copies of the same pod,
194# which is not an error, it uses a checksum to save work.
195my $digest_type = "SHA-1";
196
197my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
198my $data_dir = File::Spec->catdir($original_dir, 'porting');
199my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
200my $copy_fh;
201
202my $MAX_LINE_LENGTH = 80; # 80 columns
203my $INDENT = 8; # default nroff indent
204
205# Our warning messages. Better not have [('"] in them, as those are used as
206# delimiters for variable parts of the messages by poderror.
207my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
208my $broken_link = "Apparent broken link";
209my $broken_internal_link = "Apparent internal link is missing its forward slash";
210my $see_not_linked = "? Should you be using L<...> instead of";
211my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
212my $multiple_targets = "There is more than one target";
213my $duplicate_name = "Pod NAME already used";
214my $need_encoding = "Should have =encoding statement because have non-ASCII";
215my $encoding_first = "=encoding must be first command (if present)";
216my $no_name = "There is no NAME";
217my $missing_name_description = "The NAME should have a dash and short description after it";
218
7f8d58fb 219# objects, tests, etc can't be pods, so don't look for them. Also skip
a71e7b2c
KW
220# files output by the patch program. Could also ignore most of .gitignore
221# files, but not all, so don't.
222my $non_pods = qr/ (?: \.
223 (?: [achot] | zip | gz | bz2 | jar | tar | tgz | PL | so
224 | orig | rej | patch # Patch program output
225 | sw[op] | \#.* # Editor droppings
226 )
227 $
228 ) | ~$ # Another editor dropping
229 /x;
16384ac1
KW
230
231
232# Pod::Checker messages to suppress
233my @suppressed_messages = (
234 "(section) in", # Checker is wrong to flag this
235 "multiple occurrence of link target", # We catch independently the ones
236 # that are real problems.
237 "unescaped <>",
238);
239
240sub suppressed {
241 # Returns bool as to if input message is one that is to be suppressed
242
243 my $message = shift;
244 return grep { $message =~ /^\Q$_/i } @suppressed_messages;
245}
246
247{ # Closure to contain a simple subset of test.pl. This is to get rid of the
248 # unnecessary 'failed at' messages that would otherwise be output pointing
249 # to a particular line in this file.
a67b1afa 250
16384ac1
KW
251 my $current_test = 0;
252 my $planned;
253
254 sub plan {
255 my %plan = @_;
256 $planned = $plan{tests};
257 print "1..$planned\n";
258 return;
259 }
260
261 sub ok {
262 my $success = shift;
263 my $message = shift;
264
265 chomp $message;
266
267 $current_test++;
268 print "not " unless $success;
269 print "ok $current_test - $message\n";
270 return;
271 }
272
273 sub skip {
274 my $why = shift;
275 my $n = @_ ? shift : 1;
276 for (1..$n) {
277 $current_test++;
278 print "ok $current_test # skip $why\n";
279 }
280 no warnings 'exiting';
281 last SKIP;
282 }
283
284 sub note {
285 my $message = shift;
286
287 chomp $message;
288
289 print $message =~ s/^/# /mgr;
290 print "\n";
291 return;
292 }
293
294 END {
295 if ($planned && $planned != $current_test) {
296 print STDERR
297 "# Looks like you planned $planned tests but ran $current_test.\n";
298 }
299 }
300}
301
b3fdb838 302# This is to get this to work across multiple file systems, including those
bf8144e1
KW
303# that are not case sensitive. The db is stored in lower case, Un*x style,
304# and all file name comparisons are done that way.
b3fdb838 305sub canonicalize($) {
bf8144e1
KW
306 my $input = shift;
307 my ($volume, $directories, $file)
308 = File::Spec->splitpath(File::Spec->canonpath($input));
309 # Assumes $volume is constant for everything in this directory structure
310 $directories = "" if ! $directories;
311 $file = "" if ! $file;
312 my $output = lc join '/', File::Spec->splitdir($directories), $file;
313 $output =~ s! / /+ !/!gx; # Multiple slashes => single slash
314 return $output;
b3fdb838
KW
315}
316
16384ac1
KW
317
318# List of known potential problems by pod and type.
319my %known_problems;
320
321# Pods given by the keys contain an interior node that is referred to from
322# outside it.
323my %has_referred_to_node;
324
325my $show_counts = 0;
326my $regen = 0;
327my $show_all = 0;
328
329# Assume that are to skip anything in /cpan
330my $do_upstream_cpan = 0;
331
332while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
333 my $arg = shift @ARGV;
334
335 $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
336 if ($arg eq '-regen') {
337 $regen = 1;
338 }
339 elsif ($arg eq '-cpan') {
340 $do_upstream_cpan = 1;
341 }
342 elsif ($arg eq '-show_all') {
343 $show_all = 1;
344 }
345 elsif ($arg eq '-counts') {
346 $show_counts = 1;
347 }
348 else {
349 die <<EOF;
350Unknown option '$arg'
351
352Usage: $0 [ --regen | --cpan | --show_all ] [ FILE ... ]\n"
353 --cpan -> Include files in the cpan subdirectory.
354 --regen -> Regenerate the data file for $0
355 --show_all -> Show all known potential problems
356 --counts -> Don't test, but give summary counts of the currently
357 existing database
358EOF
359 }
360}
361
362my @files = @ARGV;
363
364if (($regen + $show_all + $show_counts + $do_upstream_cpan) > 1) {
365 croak "--regen, --show_all, --cpan, and --counts are mutually exclusive";
366}
367
368my $has_input_files = @files;
369
370if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
371 croak "--regen, --counts and --cpan can't be used since using specific files";
372}
373
374our %problems; # potential problems found in this run
375
376package My::Pod::Checker { # Extend Pod::Checker
a67b1afa
MM
377 use parent 'Pod::Checker';
378
16384ac1
KW
379 # Uses inside out hash to protect from typos
380 # For new fields, remember to add to destructor DESTROY()
381 my %indents; # Stack of indents from =over's in effect for
382 # current line
383 my %current_indent; # Current line's indent
384 my %filename; # The pod is store in this file
385 my %skip; # is SKIP set for this pod
386 my %in_NAME; # true if within NAME section
387 my %in_begin; # true if within =begin section
388 my %linkable_item; # Bool: if the latest =item is linkable. It isn't
389 # for bullet and number lists
390 my %linkable_nodes; # Pod::Checker adds all =items to its node list,
391 # but not all =items are linkable to
392 my %seen_encoding_cmd; # true if have =encoding earlier
393 my %command_count; # Number of commands seen
394 my %seen_pod_cmd; # true if have =pod earlier
395 my %warned_encoding; # true if already have warned about =encoding
396 # problems
397
398 sub DESTROY {
399 my $addr = Scalar::Util::refaddr $_[0];
400 delete $command_count{$addr};
401 delete $current_indent{$addr};
402 delete $filename{$addr};
403 delete $in_begin{$addr};
404 delete $indents{$addr};
405 delete $in_NAME{$addr};
406 delete $linkable_item{$addr};
407 delete $linkable_nodes{$addr};
408 delete $seen_encoding_cmd{$addr};
409 delete $seen_pod_cmd{$addr};
410 delete $skip{$addr};
411 delete $warned_encoding{$addr};
412 return;
413 }
414
415 sub new {
416 my $class = shift;
417 my $filename = shift;
418
419 my $self = $class->SUPER::new(-quiet => 1,
420 -warnings => $Warnings_Level);
421 my $addr = Scalar::Util::refaddr $self;
422 $command_count{$addr} = 0;
423 $current_indent{$addr} = 0;
424 $filename{$addr} = $filename;
425 $in_begin{$addr} = 0;
426 $in_NAME{$addr} = 0;
427 $linkable_item{$addr} = 0;
428 $seen_encoding_cmd{$addr} = 0;
429 $seen_pod_cmd{$addr} = 0;
430 $warned_encoding{$addr} = 0;
431 return $self;
432 }
433
434 # re's for messages that Pod::Checker outputs
435 my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
436 my $optional_location = qr/ (?: $location )? /xi;
3cb68c65 437 my $line_reference = qr/ [('"]? $optional_location \b line \s+
16384ac1
KW
438 (?: \d+ | EOF | \Q???\E | - )
439 [)'"]? /xi;
440
441 sub poderror { # Called to register a potential problem
442
443 # This adds an extra field to the parent hash, 'parameter'. It is
444 # used to extract the variable parts of a message leaving just the
445 # constant skeleton. This in turn allows the message to be
446 # categorized better, so that it shows up as a single type in our
447 # database, with the specifics of each occurrence not being stored with
448 # it.
449
450 my $self = shift;
451 my $opts = shift;
452
453 my $addr = Scalar::Util::refaddr $self;
454 return if $skip{$addr};
455
456 # Input can be a string or hash. If a string, parse it to separate
457 # out the line number and convert to a hash for easier further
458 # processing
459 my $message;
460 if (ref $opts ne 'HASH') {
461 $message = join "", $opts, @_;
462 my $line_number;
463 if ($message =~ s/\s*($line_reference)//) {
464 ($line_number = $1) =~ s/\s*$optional_location//;
465 }
466 else {
467 $line_number = '???';
468 }
469 $opts = { -msg => $message, -line => $line_number };
470 } else {
471 $message = $opts->{'-msg'};
472
473 }
474
475 $message =~ s/^\d+\s+//;
476 return if main::suppressed($message);
477
478 $self->SUPER::poderror($opts, @_);
479
480 $opts->{parameter} = "" unless $opts->{parameter};
481
482 # The variable parts of the message tend to be enclosed in '...',
483 # "....", or (...). Extract them and put them in an extra field,
484 # 'parameter'. This is trickier because the matching delimiter to a
485 # '(' is its mirror, and not itself. Text::Balanced could be used
486 # instead.
487 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
488 my $delimiter = $1;
489 my $start = $-[0];
490 $delimiter = ')' if $delimiter eq '(';
491
492 # If there is no ending delimiter, don't consider it to be a
493 # variable part. Most likely it is a contraction like "Don't"
494 last unless $message =~ m/\G .+? \Q$delimiter/xg;
495
496 my $length = $+[0] - $start;
497
498 # Get the part up through the closing delimiter
499 my $special = substr($message, $start, $length);
500 $special =~ s/^\s+//; # No leading whitespace
501
502 # And add that variable part to the parameter, while removing it
503 # from the message. This isn't a foolproof way of finding the
504 # variable part. For example '(s)' can occur in e.g.,
505 # 'paragraph(s)'
506 if ($special ne '(s)') {
507 substr($message, $start, $length) = "";
508 pos $message = $start;
509 $opts->{-msg} = $message;
510 $opts->{parameter} .= " " if $opts->{parameter};
511 $opts->{parameter} .= $special;
512 }
513 }
514
515 # Extract any additional line number given. This is often the
516 # beginning location of something whereas the main line number gives
517 # the ending one.
518 if ($message =~ /( $line_reference )/xi) {
519 my $line_ref = $1;
520 while ($message =~ s/\s*\Q$line_ref//) {
521 $opts->{-msg} = $message;
522 $opts->{parameter} .= " " if $opts->{parameter};
523 $opts->{parameter} .= $line_ref;
524 }
525 }
526
b3fdb838 527 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
16384ac1
KW
528 push @{$problems{$filename{$addr}}{$message}}, $opts;
529 #push @{$problems{$self->get_filename}{$message}}, $opts;
530 }
531
532 sub check_encoding { # Does it need an =encoding statement?
533 my ($self, $paragraph, $line_num, $pod_para) = @_;
534
535 # Do nothing if there is an =encoding in the file, or if the line
536 # doesn't require an =encoding, or have already warned.
537 my $addr = Scalar::Util::refaddr $self;
538 return if $seen_encoding_cmd{$addr}
539 || $warned_encoding{$addr}
540 || $paragraph !~ /\P{ASCII}/;
541
542 $warned_encoding{$addr} = 1;
543 my ($file, $line) = $pod_para->file_line;
544 $self->poderror({ -line => $line, -file => $file,
545 -msg => $need_encoding
546 });
547 return;
548 }
549
550 sub verbatim {
551 my ($self, $paragraph, $line_num, $pod_para) = @_;
552 $self->check_encoding($paragraph, $line_num, $pod_para);
553
554 $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
555
2592f3b8
DG
556 my $addr = Scalar::Util::refaddr $self;
557
16384ac1
KW
558 # Pick up the name, since the parent class doesn't in verbatim
559 # NAMEs; so treat as non-verbatim. The parent class only allows one
560 # paragraph in a NAME section, so if there is an extra blank line, it
561 # will trigger a message, but such a blank line is harmless, so skip
562 # in that case.
2592f3b8 563 if ($in_NAME{$addr} && $paragraph =~ /\S/) {
16384ac1
KW
564 $self->textblock($paragraph, $line_num, $pod_para);
565 }
566
567 my @lines = split /^/, $paragraph;
568 for my $i (0 .. @lines - 1) {
2592f3b8
DG
569 if ( my $encoding = $seen_encoding_cmd{$addr} ) {
570 require Encode;
571 $lines[$i] = Encode::decode($encoding, $lines[$i]);
572 }
16384ac1
KW
573 $lines[$i] =~ s/\s+$//;
574 my $indent = $self->get_current_indent;
575 my $exceeds = length(Text::Tabs::expand($lines[$i]))
576 + $indent - $MAX_LINE_LENGTH;
577 next unless $exceeds > 0;
578 my ($file, $line) = $pod_para->file_line;
579 $self->poderror({ -line => $line + $i, -file => $file,
580 -msg => $line_length,
581 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
582 });
583 }
584 }
585
586 sub textblock {
587 my ($self, $paragraph, $line_num, $pod_para) = @_;
588 $self->check_encoding($paragraph, $line_num, $pod_para);
589
590 $self->SUPER::textblock($paragraph, $line_num, $pod_para);
591
592 my ($file, $line) = $pod_para->file_line;
593 my $addr = Scalar::Util::refaddr $self;
594 if ($in_NAME{$addr}) {
595 if (! $self->name) {
596 my $text = $self->interpolate($paragraph, $line_num);
597 if ($text =~ /^\s*(\S+?)\s*$/) {
598 $self->name($1);
599 $self->poderror({ -line => $line, -file => $file,
600 -msg => $missing_name_description,
601 parameter => $1});
602 }
603 }
604 }
605 $paragraph = join " ", split /^/, $paragraph;
606
607 # Matches something that looks like a file name, but is enclosed in
608 # C<...>
609 my $C_path_re = qr{ \b ( C<
610 # exclude regexes and 'OS/2'
611 (?! (?: (?: s | qr | m) / ) | OS/2 > )
612 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
613 }x;
614
615 # If looks like a reference to other documentation by containing the
616 # word 'See' and then a likely pod directive, warn.
617
618 while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
619 ( [^<]*? ) # The not-< excludes nested C<L<...
620 > ) }ixg) {
621 my $construct = $1;
622 my $type = $2;
623 my $interior = $3;
624 if ($interior !~ /$non_pods/
625 && $construct !~ /$C_path_re/g) {
626 $self->poderror({ -line => $line, -file => $file,
627 -msg => $see_not_linked,
628 parameter => $construct
629 });
630 }
631 }
632 while ($paragraph =~ m/$C_path_re/g) {
633 my $construct = $1;
634 $self->poderror({ -line => $line, -file => $file,
635 -msg => $C_with_slash,
636 parameter => $construct
637 });
638 }
639 return;
640 }
641
642 sub command {
643 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
644 my $addr = Scalar::Util::refaddr $self;
645 if ($cmd eq "pod") {
646 $seen_pod_cmd{$addr}++;
647 }
648 elsif ($cmd eq "encoding") {
649 my ($file, $line) = $pod_para->file_line;
2592f3b8 650 $seen_encoding_cmd{$addr} = $paragraph; # for later decoding
16384ac1
KW
651 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
652 $self->poderror({ -line => $line, -file => $file,
653 -msg => $encoding_first
654 });
655 }
656 }
657 $self->check_encoding($paragraph, $line_num, $pod_para);
658
659 # Pod::Check treats all =items as linkable, but the bullet and
660 # numbered lists really aren't. So keep our own list. This has to be
661 # processed before SUPER is called so that the list is started before
662 # the rest of it gets parsed.
663 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
664 $linkable_item{$addr} = ($paragraph !~ / ^ \s*
665 (?: [*]
666 | \d+ \.? (?: \$ | \s+ )
667 )/x)
668 ? 1
669 : 0;
a67b1afa 670
16384ac1
KW
671 }
672 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
673
674 $command_count{$addr}++;
675
676 $in_NAME{$addr} = 0; # Will change to 1 below if necessary
677 $in_begin{$addr} = 0; # ibid
678 if ($cmd eq 'over') {
679 my $text = $self->interpolate($paragraph, $line_num);
680 my $indent = 4; # default
681 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
682 push @{$indents{$addr}}, $indent;
683 $current_indent{$addr} += $indent;
684 }
685 elsif ($cmd eq 'back') {
686 if (@{$indents{$addr}}) {
687 $current_indent{$addr} -= pop @{$indents{$addr}};
688 }
689 else {
690 # =back without corresponding =over, but should have
691 # warned already
692 $current_indent{$addr} = 0;
693 }
694 }
695 elsif ($cmd =~ /^head/) {
696 if (! $in_begin{$addr}) {
697
698 # If a particular formatter, then this command doesn't really
699 # apply
700 $current_indent{$addr} = 0;
701 undef @{$indents{$addr}};
702 }
703
704 my $text = $self->interpolate($paragraph, $line_num);
705 $in_NAME{$addr} = 1 if $cmd eq 'head1'
706 && $text && $text =~ /^NAME\b/;
707 }
708 elsif ($cmd eq 'begin') {
709 $in_begin{$addr} = 1;
710 }
711
712 return;
713 }
714
715 sub hyperlink {
716 my $self = shift;
717
718 # If the hyperlink is to an interior node of another page, save it
719 # so that we can see if we need to parse normally skipped files.
720 $has_referred_to_node{$_[0][1]{'-page'}} = 1
721 if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
722 return $self->SUPER::hyperlink($_[0]);
723 }
724
725 sub node {
726 my $self = shift;
727 my $text = $_[0];
728 if($text) {
729 $text =~ s/\s+$//s; # strip trailing whitespace
730 $text =~ s/\s+/ /gs; # collapse whitespace
731 my $addr = Scalar::Util::refaddr $self;
732 push(@{$linkable_nodes{$addr}}, $text) if
733 ! $current_indent{$addr}
734 || $linkable_item{$addr};
735 }
736 return $self->SUPER::node($_[0]);
737 }
738
739 sub get_current_indent {
740 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
741 }
742
743 sub get_filename {
744 return $filename{Scalar::Util::refaddr $_[0]};
745 }
746
747 sub linkable_nodes {
748 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
749 return undef unless $linkables;
750 return @$linkables;
751 }
752
753 sub get_skip {
754 return $skip{Scalar::Util::refaddr $_[0]} // 0;
755 }
756
757 sub set_skip {
758 my $self = shift;
759 $skip{Scalar::Util::refaddr $self} = shift;
760
761 # If skipping, no need to keep the problems for it
762 delete $problems{$self->get_filename};
763 return;
764 }
765}
766
767package Tie_Array_to_FH { # So printing actually goes to an array
768
769 my %array;
770
771 sub TIEHANDLE {
772 my $class = shift;
773 my $array_ref = shift;
774
775 my $self = bless \do{ my $anonymous_scalar }, $class;
776 $array{Scalar::Util::refaddr $self} = $array_ref;
777
778 return $self;
779 }
780
781 sub PRINT {
a67b1afa 782 my $self = shift;
16384ac1
KW
783 push @{$array{Scalar::Util::refaddr $self}}, @_;
784 return 1;
785 }
a67b1afa
MM
786}
787
69f6a9a1 788
16384ac1
KW
789my %filename_to_checker; # Map a filename to it's pod checker object
790my %id_to_checker; # Map a checksum to it's pod checker object
791my %nodes; # key is filename, values are nodes in that file.
792my %nodes_first_word; # same, but value is first word of each node
793my %valid_modules; # List of modules known to exist outside us.
794my %digests; # checksums of files, whose names are the keys
795my %filename_to_pod; # Map a filename to its pod NAME
796my %files_with_unknown_issues;
797my %files_with_fixes;
69f6a9a1 798
16384ac1 799my $data_fh;
be39e7f8 800open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
69f6a9a1 801
16384ac1
KW
802my %counts; # For --counts param, count of each issue type
803my %suppressed_files; # Files with at least one issue type to suppress
804
805while (<$data_fh>) { # Read the data base
69f6a9a1 806 chomp;
16384ac1
KW
807 next if /^\s*(?:#|$)/; # Skip comment and empty lines
808 if (/\t/) {
809 next if $show_all;
810
811 # Keep track of counts of each issue type for each file
812 my ($filename, $message, $count) = split /\t/;
813 $known_problems{$filename}{$message} = $count;
814
815 if ($show_counts) {
816 if ($count < 0) { # -1 means to suppress this issue type
817 $suppressed_files{$filename} = $filename;
818 }
819 else {
820 $counts{$message} += $count;
821 }
822 }
823 }
824 else { # Lines without a tab are modules known to be valid
825 $valid_modules{$_} = 1
826 }
827}
828close $data_fh;
829
830if ($show_counts) {
831 my $total = 0;
832 foreach my $message (sort keys %counts) {
833 $total += $counts{$message};
834 note(Text::Tabs::expand("$counts{$message}\t$message"));
835 }
836 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
837 if (%suppressed_files) {
838 note("\nFiles that have all messages of at least one type suppressed:");
839 note(join ",", keys %suppressed_files);
840 }
841 exit 0;
842}
843
844
845my %excluded_files = (
846 "lib/unicore/mktables" => 1,
847 "Porting/perldelta_template.pod" => 1,
848 "autodoc.pl" => 1,
849 "configpm" => 1,
850 "miniperl" => 1,
851 "perl" => 1,
e678c294
KW
852
853 # It would be nice if we didn't have to skip this,
854 # but the errors in it are too variable.
855 "pod/perltoc.pod" => 1,
16384ac1
KW
856 );
857
b3fdb838
KW
858# Convert to more generic form.
859foreach my $file (keys %excluded_files) {
03ca349a
KW
860 delete $excluded_files{$file};
861 $excluded_files{canonicalize($file)} = 1;
b3fdb838
KW
862}
863
16384ac1
KW
864# re to match files that are to be parsed only if there is an internal link
865# to them. It does not include cpan, as whether those are parsed depends
866# on a switch. Currently, only the stable perldelta.pod's are included.
867# These all have characters between 'perl' and 'delta'. (Actually the
868# currently developed one matches as well, but is a duplicate of
869# perldelta.pod, so can be skipped, so fine for it to match this.
870my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
871
872{ # Closure
873 my $first_time = 1;
874
875 sub output_thanks ($$$$) { # Called when an issue has been fixed
876 my $filename = shift;
877 my $original_count = shift;
878 my $current_count = shift;
879 my $message = shift;
880
881 $files_with_fixes{$filename} = 1;
882 my $return;
883 my $fixed_count = $original_count - $current_count;
884 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
885 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
886 my $diff;
887 if ($message) {
888 $diff = <<EOF;
889There were $original_count occurrences (now $current_count) in this pod of type
890"$message",
891EOF
892 } else {
893 $diff = <<EOF;
894There are no longer any problems found in this pod!
895EOF
896 }
897
898 if ($first_time) {
899 $first_time = 0;
900 $return = <<EOF;
901Thanks for fixing $a_problem!
902$diff
903Now you must teach $0 that this was fixed.
904EOF
905 }
906 else {
907 $return = <<EOF
908Thanks for fixing $another_problem.
909$diff
910EOF
911 }
912
913 return $return;
914 }
915}
916
917sub my_safer_print { # print, with error checking for outputting to db
918 my ($fh, @lines) = @_;
919
920 if (! print $fh @lines) {
921 my $save_error = $!;
922 close($fh);
923 die "Write failure: $save_error";
924 }
925}
926
927sub extract_pod { # Extracts just the pod from a file
928 my $filename = shift;
929
930 my @pod;
931
932 # Arrange for the output of Pod::Parser to be collected in an array we can
933 # look at instead of being printed
934 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
be39e7f8 935 open my $in_fh, '<:bytes', $filename
763df156
KW
936
937 # The file should already have been opened once to get here, so if
77b8b9ad
KW
938 # fails, just die. It's possible that a transitory file containing a
939 # pod would get here, but not bothering to add code for that very
940 # unlikely event.
16384ac1
KW
941 or die "Can't open '$filename': $!\n";
942
943 my $parser = Pod::Parser->new();
944 $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
945 close $in_fh;
946
947 return join "", @pod
948}
949
950my $digest = Digest->new($digest_type);
951
952sub is_pod_file {
c1dcaaab
KW
953 # If $_ is a pod file, add it to the lists and do other prep work.
954
16384ac1
KW
955 if (-d $_) {
956 # Don't look at files in directories that are for tests, nor those
957 # beginning with a dot
958 if ($_ eq 't' || $_ =~ /^\../) {
959 $File::Find::prune = 1;
960 }
961 return;
962 }
963
964 return if $_ =~ /^\./; # No hidden Unix files
965 return if $_ =~ $non_pods;
966
967 my $filename = $File::Find::name;
968
969 # Assumes that the path separator is exactly one character.
970 $filename =~ s/^\..//;
971
b3fdb838 972 return if $excluded_files{canonicalize($filename)};
16384ac1 973
144a708b
NC
974 my $contents = do {
975 local $/;
763df156
KW
976 my $candidate;
977 if (! open $candidate, '<:bytes', $_) {
978
77b8b9ad
KW
979 # If a transitory file was found earlier, the open could fail
980 # legitimately and we just skip the file; also skip it if it is a
981 # broken symbolic link, as it is probably just a build problem;
982 # certainly not a file that we would want to check the pod of.
983 # Otherwise fail it here and no reason to process it further.
984 # (But the test count will be off too)
985 ok(0, "Can't open '$filename': $!")
986 if -e $filename && ! -l $filename;
763df156
KW
987 return;
988 }
144a708b
NC
989 <$candidate>;
990 };
16384ac1
KW
991
992 # If the file is a .pm or .pod, having any initial '=' on a line is
993 # grounds for testing it. Otherwise, require a head1 NAME line to view it
994 # as a potential pod
144a708b
NC
995 if ($filename =~ /\.(?:pm|pod)/) {
996 return unless $contents =~ /^=/m;
997 } else {
998 return unless $contents =~ /^=head1 +NAME/m;
16384ac1 999 }
144a708b
NC
1000
1001 # Here, we know that the file is a pod. Add it to the list of files
1002 # to check and create a checker object for it.
1003
1004 push @files, $filename;
1005 my $checker = My::Pod::Checker->new($filename);
1006 $filename_to_checker{$filename} = $checker;
1007
1008 # In order to detect duplicate pods and only analyze them once, we
1009 # compute checksums for the file, so don't have to do an exact
1010 # compare. Note that if the pod is just part of the file, the
1011 # checksums can differ for the same pod. That special case is handled
1012 # later, since if the checksums of the whole file are the same, that
1013 # case won't even come up. We don't need the checksums for files that
1014 # we parse only if there is a link to its interior, but we do need its
1015 # NAME, which is also retrieved in the code below.
1016
1017 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
1018 | $only_for_interior_links_re
1019 /x) {
1020 $digest->add($contents);
1021 $digests{$filename} = $digest->digest;
1022
1023 # lib files aren't analyzed if they are duplicates of files copied
1024 # there from some other directory. But to determine this, we need
1025 # to know their NAMEs. We might as well find the NAME now while
1026 # the file is open. Similarly, cpan files aren't analyzed unless
1027 # we're analyzing all of them, or this particular file is linked
1028 # to by a file we are analyzing, and thus we will want to verify
1029 # that the target exists in it. We need to know at least the NAME
1030 # to see if it's worth analyzing, or so we can determine if a lib
1031 # file is a copy of a cpan one.
1032 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
16384ac1 1033 | $only_for_interior_links_re
144a708b
NC
1034 }x) {
1035 if ($contents =~ /^=head1 +NAME.*/mg) {
1036 # The NAME is the first non-spaces on the line up to a
1037 # comma, dash or end of line. Otherwise, it's invalid and
1038 # this pod doesn't have a legal name that we're smart
1039 # enough to find currently. But the parser will later
1040 # find it if it thinks there is a legal name, and set the
1041 # name
1042 if ($contents =~ /\G # continue from the line after =head1
1043 \s* # ignore any empty lines
1044 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1045 my $name = $1;
1046 $checker->name($name);
1047 $id_to_checker{$name} = $checker
1048 if $filename =~ m{^cpan/};
16384ac1
KW
1049 }
1050 }
144a708b
NC
1051 elsif ($filename =~ m{^cpan/}) {
1052 $id_to_checker{$digests{$filename}} = $checker;
1053 }
16384ac1
KW
1054 }
1055 }
c1dcaaab
KW
1056
1057 return;
16384ac1
KW
1058} # End of is_pod_file()
1059
1060# Start of real code that isn't processing the command line.
1061# Here, @files contains list of files on the command line. If have any of
1062# these, unconditionally test them, and show all the errors, even the known
1063# ones, and, since not testing other pods, don't do cross-pod link tests.
1064# (Could add extra code to do cross-pod tests for the ones in the list.)
1065if ($has_input_files) {
1066 undef %known_problems;
1067 $do_upstream_cpan = 1; # In case one of the inputs is from cpan
1068
1069}
1070else { # No input files -- go find all the possibilities.
1071 if ($regen) {
1072 $copy_fh = open_new($known_issues);
1073 note("Regenerating $known_issues, please be patient...");
1074 print $copy_fh <<END;
1075# This file is the data file for $0.
1076# There are three types of lines.
1077# Comment lines are white-space only or begin with a '#', like this one. Any
1078# changes you make to the comment lines will be lost when the file is
1079# regen'd.
1080# Lines without tab characters are simply NAMES of pods that the program knows
1081# will have links to them and the program does not check if those links are
1082# valid.
1083# All other lines should have three fields, each separated by a tab. The
1084# first field is the name of a pod; the second field is an error message
1085# generated by this program; and the third field is a count of how many
1086# known instances of that message there are in the pod. -1 means that the
1087# program can expect any number of this type of message.
1088END
1089 }
1090
1091 # Move to the directory above us, but have to adjust @INC to account for
1092 # that.
1093 s{^\.\./lib$}{lib} for @INC;
1094 chdir File::Spec->updir;
1095
1096 # And look in this directory and all its subdirectories
1097 find( \&is_pod_file, '.');
1098
1099 # Add ourselves to the test
b3fdb838 1100 push @files, "t/porting/podcheck.t";
16384ac1
KW
1101}
1102
1103# Now we know how many tests there will be.
1104plan (tests => scalar @files) if ! $regen;
1105
1106
1107 # Sort file names so we get consistent results, and to put cpan last,
1108 # preceeded by the ones that we don't generally parse. This is because both
1109 # these classes are generally parsed only if there is a link to the interior
1110 # of them, and we have to parse all others first to guarantee that they don't
1111 # have such a link. 'lib' files come just before these, as some of these are
1112 # duplicates of others. We already have figured this out when gathering the
1113 # data as a special case for all such files, but this, while unnecessary,
1114 # puts the derived file last in the output. 'readme' files come before those,
1115 # as those also could be duplicates of others, which are considered the
1116 # primary ones. These currently aren't figured out when gathering data, so
1117 # are done here.
1118 @files = sort { if ($a =~ /^cpan/) {
1119 return 1 if $b !~ /^cpan/;
1120 return $a cmp $b;
1121 }
1122 elsif ($b =~ /^cpan/) {
1123 return -1;
1124 }
1125 elsif ($a =~ /$only_for_interior_links_re/) {
1126 return 1 if $b !~ /$only_for_interior_links_re/;
1127 return $a cmp $b;
1128 }
1129 elsif ($b =~ /$only_for_interior_links_re/) {
1130 return -1;
1131 }
1132 elsif ($a =~ /^lib/) {
1133 return 1 if $b !~ /^lib/;
1134 return $a cmp $b;
1135 }
1136 elsif ($b =~ /^lib/) {
1137 return -1;
1138 } elsif ($a =~ /\breadme\b/i) {
1139 return 1 if $b !~ /\breadme\b/i;
1140 return $a cmp $b;
1141 }
1142 elsif ($b =~ /\breadme\b/i) {
1143 return -1;
1144 }
1145 else {
1146 return lc $a cmp lc $b;
1147 }
1148 }
1149 @files;
1150
1151# Now go through all the files and parse them
1152foreach my $filename (@files) {
1153 my $parsed = 0;
1154 note("parsing $filename") if DEBUG;
1155
1156 # We may have already figured out some things in the process of generating
1157 # the file list. If so, have a $checker object already. But if not,
1158 # generate one now.
1159 my $checker = $filename_to_checker{$filename};
1160 if (! $checker) {
1161 $checker = My::Pod::Checker->new($filename);
1162 $filename_to_checker{$filename} = $checker;
1163 }
1164
1165 # We have set the name in the checker object if there is a possibility
1166 # that no further parsing is necessary, but otherwise do the parsing now.
1167 if (! $checker->name) {
1168 $parsed = 1;
1169 $checker->parse_from_file($filename, undef);
1170 }
1171
1172 if ($checker->num_errors() < 0) { # Returns negative if not a pod
1173 $checker->set_skip("$filename is not a pod");
1174 }
1175 else {
1176
1177 # Here, is a pod. See if it is one that has already been tested,
1178 # or should be tested under another directory. Use either its NAME
1179 # if it has one, or a checksum if not.
1180 my $name = $checker->name;
1181 my $id;
1182
1183 if ($name) {
1184 $id = $name;
1185 }
1186 else {
1187 my $digest = Digest->new($digest_type);
1188 $digest->add(extract_pod($filename));
1189 $id = $digest->digest;
1190 }
1191
1192 # If there is a match for this pod with something that we've already
1193 # processed, don't process it, and output why.
1194 my $prior_checker;
1195 if (defined ($prior_checker = $id_to_checker{$id})
1196 && $prior_checker != $checker) # Could have defined the checker
1197 # earlier without pursuing it
1198 {
1199
1200 # If the pods are identical, then it's just a copy, and isn't an
1201 # error. First use the checksums we have already computed to see
1202 # if the entire files are identical, which means that the pods are
1203 # identical too.
1204 my $prior_filename = $prior_checker->get_filename;
1205 my $same = (! $name
1206 || ($digests{$prior_filename}
1207 && $digests{$filename}
1208 && $digests{$prior_filename} eq $digests{$filename}));
1209
1210 # If they differ, it could be that the files differ for some
1211 # reason, but the pods they contain are identical. Extract the
1212 # pods and do the comparisons on just those.
1213 if (! $same && $name) {
1214 $same = extract_pod($prior_filename) eq extract_pod($filename);
1215 }
1216
1217 if ($same) {
1218 $checker->set_skip("The pod of $filename is a duplicate of "
1219 . "the pod for $prior_filename");
1220 } elsif ($prior_filename =~ /\breadme\b/i) {
1221 $checker->set_skip("$prior_filename is a README apparently for $filename");
1222 } elsif ($filename =~ /\breadme\b/i) {
1223 $checker->set_skip("$filename is a README apparently for $prior_filename");
1224 } else { # Here have two pods with identical names that differ
1225 $prior_checker->poderror(
1226 { -msg => $duplicate_name,
1227 -line => "???",
1228 parameter => "'$filename' also has NAME '$name'"
1229 });
1230 $checker->poderror(
1231 { -msg => $duplicate_name,
1232 -line => "???",
1233 parameter => "'$prior_filename' also has NAME '$name'"
1234 });
1235
1236 # Changing the names helps later.
1237 $prior_checker->name("$name version arbitrarily numbered 1");
1238 $checker->name("$name version arbitrarily numbered 2");
1239 }
1240
1241 # In any event, don't process this pod that has the same name as
1242 # another.
1243 next;
1244 }
1245
1246 # A unique pod.
1247 $id_to_checker{$id} = $checker;
1248
1249 my $parsed_for_links = ", but parsed for its interior links";
1250 if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1251 || $filename =~ $only_for_interior_links_re)
1252 {
1253 if ($filename =~ /^cpan/) {
1254 $checker->set_skip("CPAN is upstream for $filename");
1255 }
1256 elsif ($filename =~ /perl\d+delta/) {
1257 $checker->set_skip("$filename is a stable perldelta");
1258 }
1259 else {
1260 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1261 }
1262
1263 if ($name && $has_referred_to_node{$name}) {
1264 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1265 }
1266 }
1267
1268 # Need a name in order to process it, because not meaningful
1269 # otherwise, and also can't test links to this without a name.
1270 if (!defined $name) {
1271 $checker->poderror( { -msg => $no_name,
1272 -line => '???'
1273 });
1274 next;
1275 }
1276
1277 # For skipped files, just get its NAME
1278 my $skip;
1279 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1280 {
1281 $checker->node($name) if $name;
1282 }
1283 else {
1284 $checker->parse_from_file($filename, undef) if ! $parsed;
1285 }
1286
1287 # Go through everything in the file that could be an anchor that
1288 # could be a link target. Count how many there are of the same name.
1289 foreach my $node ($checker->linkable_nodes) {
1290 next if ! $node; # Can be empty is like '=item *'
1291 if (exists $nodes{$name}{$node}) {
1292 $nodes{$name}{$node}++;
1293 }
1294 else {
1295 $nodes{$name}{$node} = 1;
1296 }
1297
1298 # Experiments have shown that cpan search can figure out the
1299 # target of a link even if the exact wording is incorrect, as long
1300 # as the first word is. This happens frequently in perlfunc.pod,
1301 # where the link will be just to the function, but the target
1302 # entry also includes parameters to the function.
1303 my $first_word = $node;
1304 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1305 $nodes_first_word{$name}{$first_word} = $node;
1306 }
1307 }
1308 $filename_to_pod{$filename} = $name;
1309 }
1310}
1311
1312# Here, all files have been parsed, and all links and link targets are stored.
1313# Now go through the files again and see which don't have matches.
1314if (! $has_input_files) {
1315 foreach my $filename (@files) {
1316 next if $filename_to_checker{$filename}->get_skip;
1317 my $checker = $filename_to_checker{$filename};
1318 foreach my $link ($checker->hyperlink) {
1319 my $linked_to_page = $link->[1]->page;
1320 next unless $linked_to_page; # intra-file checks are handled by std
1321 # Pod::Checker
1322
1323 # Initialize the potential message.
1324 my %problem = ( -msg => $broken_link,
1325 -line => $link->[0],
1326 parameter => "to \"$linked_to_page\"",
1327 );
1328
1329 # See if we have found the linked-to_file in our parse
1330 if (exists $nodes{$linked_to_page}) {
1331 my $node = $link->[1]->node;
1332
1333 # If link is only to the page-level, already have it
1334 next if ! $node;
1335
1336 # Transform pod language to what we are expecting
1337 $node =~ s,E<sol>,/,g;
1338 $node =~ s/E<verbar>/|/g;
1339
1340 # If link is to a node that exists in the file, is ok
1341 if ($nodes{$linked_to_page}{$node}) {
1342
1343 # But if the page has multiple targets with the same name,
1344 # it's ambiguous which one this should be to.
1345 if ($nodes{$linked_to_page}{$node} > 1) {
1346 $problem{-msg} = $multiple_targets;
1347 $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1348 $checker->poderror(\%problem);
1349 }
1350 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1351
1352 # Here the link target was not found, either exactly or to
1353 # the first word. Is an error.
1354 $problem{parameter} =~ s,"$,/$node",;
1355 $checker->poderror(\%problem);
1356 }
1357
1358 } # Linked-to-file not in parse; maybe is in exception list
1359 elsif (! exists $valid_modules{$link->[1]->page}) {
1360
1361 # Here, is a link to a target that we can't find. Check if
1362 # there is an internal link on the page with the target name.
1363 # If so, it could be that they just forgot the initial '/'
1c01047d
KW
1364 # But perldelta is handled specially: only do this if the
1365 # broken link isn't one of the known bad ones (that are
1366 # placemarkers and should be removed for the final)
1367 my $NAME = $filename_to_pod{$filename};
1368 if (! defined $NAME) {
1369 $checker->poderror(\%problem);
1370 }
1371 elsif ($NAME ne "perldelta"
1372 || ! grep { $linked_to_page eq $_ } @perldelta_ignore_links)
16384ac1 1373 {
1c01047d
KW
1374 if ($nodes{$NAME}{$linked_to_page}) {
1375 $problem{-msg} = $broken_internal_link;
1376 }
1377 $checker->poderror(\%problem);
16384ac1 1378 }
16384ac1
KW
1379 }
1380 }
1381 }
1382}
1383
1384# If regenerating the data file, start with the modules for which we don't
1385# check targets
1386if ($regen) {
1387 foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1388 my_safer_print($copy_fh, $_, "\n");
1389 }
1390}
1391
1392# Now ready to output the messages.
1393foreach my $filename (@files) {
1394 my $test_name = "POD of $filename";
b3fdb838 1395 my $canonical = canonicalize($filename);
16384ac1
KW
1396 SKIP: {
1397 my $skip = $filename_to_checker{$filename}->get_skip // "";
1398
1399 if ($regen) {
1400 foreach my $message ( sort keys %{$problems{$filename}}) {
1401 my $count;
1402
1403 # Preserve a negative setting.
b3fdb838
KW
1404 if ($known_problems{$canonical}{$message}
1405 && $known_problems{$canonical}{$message} < 0)
16384ac1 1406 {
b3fdb838 1407 $count = $known_problems{$canonical}{$message};
16384ac1
KW
1408 }
1409 else {
1410 $count = @{$problems{$filename}{$message}};
1411 }
b3fdb838 1412 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
16384ac1
KW
1413 }
1414 next;
1415 }
1416
1417 skip($skip, 1) if $skip;
1418 my @diagnostics;
1419 my $indent = ' ';
1420
16384ac1
KW
1421 my $total_known = 0;
1422 foreach my $message ( sort keys %{$problems{$filename}}) {
b3fdb838
KW
1423 $known_problems{$canonical}{$message} = 0
1424 if ! $known_problems{$canonical}{$message};
16384ac1
KW
1425 my $diagnostic = "";
1426 my $problem_count = scalar @{$problems{$filename}{$message}};
1427 $total_known += $problem_count;
b3fdb838
KW
1428 next if $known_problems{$canonical}{$message} < 0;
1429 if ($problem_count > $known_problems{$canonical}{$message}) {
16384ac1
KW
1430
1431 # Here we are about to output all the messages for this type,
1432 # subtract back this number we previously added in.
1433 $total_known -= $problem_count;
1434
1435 $diagnostic .= $indent . $message;
1436 if ($problem_count > 2) {
1437 $diagnostic .= " ($problem_count occurrences)";
1438 }
1439 foreach my $problem (@{$problems{$filename}{$message}}) {
1440 $diagnostic .= " " if $problem_count == 1;
1441 $diagnostic .= "\n$indent$indent";
1442 $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1443 $diagnostic .= " near line $problem->{-line}";
1444 $diagnostic .= " $problem->{comment}" if $problem->{comment};
1445 }
1446 $diagnostic .= "\n";
1447 $files_with_unknown_issues{$filename} = 1;
b3fdb838
KW
1448 } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1449 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
16384ac1
KW
1450 }
1451 push @diagnostics, $diagnostic if $diagnostic;
1452 }
1453
09ea063a
KW
1454 # The above loop has output messages where there are current potential
1455 # issues. But it misses where there were some that have been entirely
1456 # fixed. For those, we need to look through the old issues
b3fdb838 1457 foreach my $message ( sort keys %{$known_problems{$canonical}}) {
09ea063a 1458 next if $problems{$filename}{$message};
b3fdb838
KW
1459 next if ! $known_problems{$canonical}{$message};
1460 next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1461 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
09ea063a
KW
1462 push @diagnostics, $diagnostic if $diagnostic;
1463 }
1464
16384ac1
KW
1465 my $output = "POD of $filename";
1466 $output .= ", excluding $total_known not shown known potential problems"
1467 if $total_known;
1468 ok(@diagnostics == 0, $output);
1469 if (@diagnostics) {
1470 note(join "", @diagnostics,
1471 "See end of this test output for your options");
1472 }
1473 }
1474}
1475
1476my $how_to = <<EOF;
1477 run this test script by hand, using the following formula (on
1478 Un*x-like machines):
1479 cd t
1480 ./perl -I../lib porting/podcheck.t --regen
1481EOF
1482
1483if (%files_with_unknown_issues) {
1484 my $were_count_files = scalar keys %files_with_unknown_issues;
1485 $were_count_files = ($were_count_files == 1)
1486 ? "was $were_count_files file"
1487 : "were $were_count_files files";
1488 my $message = <<EOF;
1489
1490HOW TO GET THIS .t TO PASS
1491
1492There $were_count_files that had new potential problems identified. To get
1493this .t to pass, do the following:
1494
14951) If a problem is about a link to an unknown module that you know exists,
1496 simply edit the file,
1497 $known_issues
1498 and add anywhere a line that contains just the module's name.
1499 (Don't do this for a module that you aren't sure about; instead treat
1500 as another type of issue and follow the instructions below.)
1501
15022) For other issues, decide if each should be fixed now or not. Fix the
1503 ones you decided to, and rerun this test to verify that the fixes
1504 worked.
1505
15063) If there remain potential problems that you don't plan to fix right
1507 now (or aren't really problems),
1508$how_to
1509 That should cause all current potential problems to be accepted by the
1510 program, so that the next time it runs, they won't be flagged.
1511EOF
1512 if (%files_with_fixes) {
1513 $message .= " This step will also take care of the files that have fixes in them\n";
1514 }
1515
1516 $message .= <<EOF;
1517 For a few files, such as perltoc, certain issues will always be
1518 expected, and more of the same will be added over time. For those,
1519 before you do the regen, you can edit
1520 $known_issues
1521 and find the entry for the module's file and specific error message,
1522 and change the count of known potential problems to -1.
1523EOF
1524
1525 note($message);
1526} elsif (%files_with_fixes) {
1527 note(<<EOF
1528To teach this test script that the potential problems have been fixed,
1529$how_to
1530EOF
1531 );
1532}
1533
1534if ($regen) {
1535 chdir $original_dir || die "Can't change directories to $original_dir";
1536 close_and_rename($copy_fh);
1537}