5 use feature 'unicode_strings';
15 require '../regen/regen_lib.pl';
24 podcheck.t - Look for possible problems in the Perl pods
29 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--counts]
31 ./perl -I../lib porting/podcheck.t --regen
35 podcheck.t is an extension of Pod::Checker. It looks for pod errors and
36 potential errors in the files given as arguments, or if none specified, in all
37 pods in the distribution workspace, except those in the cpan directory (unless
38 C<--cpan> is specified). It does additional checking beyond that done by
39 Pod::Checker, and keeps a database of known potential problems, and will
40 fail a pod only if the number of such problems differs from that given in the
41 database. It also suppresses the C<(section) deprecated> message from
42 Pod::Checker, since specifying the man page section number is quite proper to do.
44 The additional checks it makes are:
48 =item Cross-pod link checking
50 Pod::Checker verifies that links to an internal target in a pod are not
51 broken. podcheck.t extends that (when called without FILE arguments) to
52 external links. It does this by gathering up all the possible targets in the
53 workspace, and cross-checking them. The database has a list of known targets
54 outside the workspace, so podcheck.t will not raise a warning for
55 using those. It also checks that a non-broken link points to just one target.
56 (The destination pod could have two targets with the same name.)
58 =item An internal link that isn't so specified
60 If a link is broken, but there is an existing internal target of the same
61 name, it is likely that the internal target was meant, and the C<"/"> is
62 missing from the C<LE<lt>E<gt>> pod command.
64 =item Verbatim paragraphs that wrap in an 80 column window
66 It's annoying to have lines wrap when displaying pod documentation in a
67 terminal window. This checks that all such lines fit, and for those that
68 don't, it tells you how much needs to be cut in order to fit. However,
69 if you're fixing these, keep in mind that some terminal/pager combinations
70 require really a maximum of 79 or 78 columns to display properly.
72 Often, the easiest thing to do to gain space for these is to lower the indent
75 =item Missing or duplicate NAME or missing NAME short description
77 A pod can't be linked to unless it has a unique name.
78 And a NAME should have a dash and short description after it.
80 =item =encoding statement issues
82 This indicates if an C<=encoding> statement should be present, or moved to the
85 =item Items that perhaps should be links
87 There are mentions of apparent files in the pods that perhaps should be links
88 instead, using C<LE<lt>...E<gt>>
90 =item Items that perhaps should be C<FE<lt>...E<gt>>
92 What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
93 C<FE<lt>...E<gt>> mark-up instead.
97 A number of issues raised by podcheck.t and by the base Pod::Checker are not
98 really problems, but merely potential problems. After inspecting them and
99 deciding that they aren't real problems, it is possible to shut up this program
100 about them, unlike base Pod::Checker. To do this, call podcheck.t with the
101 C<--regen> option to regenerate the database. This tells it that all existing
102 issues are to not be mentioned again.
104 This isn't fool-proof. The database merely keeps track of the number of these
105 potential problems of each type for each pod. If a new problem of a given
106 type is introduced into the pod, podcheck.t will spit out all of them. You
107 then have to figure out which is the new one, and should it be changed or not.
108 But doing it this way insulates the database from having to keep track of line
109 numbers of problems, which may change, or the exact wording of each problem
110 which might also change without affecting whether it is a problem or not.
112 Also, if the count of potential problems of a given type for a pod decreases,
113 the database must be regenerated so that it knows the new number. The program
114 gives instructions when this happens.
116 There is currently no check that modules listed as valid in the data base
117 actually are. Thus any errors introduced there will remain there.
125 Regenerate the data base used by podcheck.t to include all the existing
126 potential problems. Future runs of the program will not then flag any of
131 Normally, all pods in the cpan directory are skipped, except to make sure that
132 any blead-upstream links to such pods are valid.
133 This option will cause cpan upstream pods to be checked.
137 Normally, if the number of potential problems of a given type found for a
138 pod matches the expected value in the database, they will not be displayed.
139 This option forces the database to be ignored during the run, so all potential
140 problems are displayed and will fail their respective pod test. Specifying
141 any particular FILES to operate on automatically selects this option.
145 Instead of testing, this just dumps the counts of the occurrences of the
146 various types of potential problems in the data base.
152 The database is stored in F<t/porting/known_pod_issues.dat>
160 #####################################################
161 # HOW IT WORKS (in general)
163 # If not called with specific files to check, the directory structure is
164 # examined for files that have pods in them. Files that might not have to be
165 # fully parsed (e.g. in cpan) are parsed enough at this time to find their
166 # pod's NAME, and to get a checksum.
168 # Those kinds of files are sorted last, but otherwise the pods are parsed with
169 # the package coded here, My::Pod::Checker, which is an extension to
170 # Pod::Checker that adds some tests and suppresses others that aren't
171 # appropriate. The latter module has no provision for capturing diagnostics,
172 # so a package, Tie_Array_to_FH, is used to force them to be placed into an
173 # array instead of printed.
175 # Parsing the files builds up a list of links. The files are gone through
176 # again, doing cross-link checking and outputting all saved-up problems with
179 # Sorting the files last that potentially don't need to be fully parsed allows
180 # us to not parse them unless there is a link to an internal anchor in them
181 # from something that we have already parsed. Keeping checksums allows us to
182 # not parse copies of other pods.
184 #####################################################
186 # 1 => Exclude low priority messages that aren't likely to be problems, and
187 # has many false positives; higher numbers give more messages.
188 my $Warnings_Level = 200;
190 # To see if two pods with the same NAME are actually copies of the same pod,
191 # which is not an error, it uses a checksum to save work.
192 my $digest_type = "SHA-1";
194 my $original_dir = File::Spec->rel2abs(File::Spec->curdir);
195 my $data_dir = File::Spec->catdir($original_dir, 'porting');
196 my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
199 my $MAX_LINE_LENGTH = 80; # 80 columns
200 my $INDENT = 8; # default nroff indent
202 # Our warning messages. Better not have [('"] in them, as those are used as
203 # delimiters for variable parts of the messages by poderror.
204 my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
205 my $broken_link = "Apparent broken link";
206 my $broken_internal_link = "Apparent internal link is missing its forward slash";
207 my $see_not_linked = "? Should you be using L<...> instead of";
208 my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
209 my $multiple_targets = "There is more than one target";
210 my $duplicate_name = "Pod NAME already used";
211 my $need_encoding = "Should have =encoding statement because have non-ASCII";
212 my $encoding_first = "=encoding must be first command (if present)";
213 my $no_name = "There is no NAME";
214 my $missing_name_description = "The NAME should have a dash and short description after it";
216 # objects, tests, etc can't be pods, so don't look for them.
217 my $non_pods = qr/\.(?:[achot]|zip|gz|bz2|jar|tar|tgz|PL|so)$/;
220 # Pod::Checker messages to suppress
221 my @suppressed_messages = (
222 "(section) in", # Checker is wrong to flag this
223 "multiple occurrence of link target", # We catch independently the ones
224 # that are real problems.
229 # Returns bool as to if input message is one that is to be suppressed
232 return grep { $message =~ /^\Q$_/i } @suppressed_messages;
235 { # Closure to contain a simple subset of test.pl. This is to get rid of the
236 # unnecessary 'failed at' messages that would otherwise be output pointing
237 # to a particular line in this file.
239 my $current_test = 0;
244 $planned = $plan{tests};
245 print "1..$planned\n";
256 print "not " unless $success;
257 print "ok $current_test - $message\n";
263 my $n = @_ ? shift : 1;
266 print "ok $current_test # skip $why\n";
268 no warnings 'exiting';
277 print $message =~ s/^/# /mgr;
283 if ($planned && $planned != $current_test) {
285 "# Looks like you planned $planned tests but ran $current_test.\n";
290 # This is to get this to work across multiple file systems, including those
291 # that are not case sensitive. The db is stored in lower case, and all file
292 # name comparisons are done that way.
293 sub canonicalize($) {
294 return lc File::Spec->canonpath(shift);
298 # List of known potential problems by pod and type.
301 # Pods given by the keys contain an interior node that is referred to from
303 my %has_referred_to_node;
309 # Assume that are to skip anything in /cpan
310 my $do_upstream_cpan = 0;
312 while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
313 my $arg = shift @ARGV;
315 $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
316 if ($arg eq '-regen') {
319 elsif ($arg eq '-cpan') {
320 $do_upstream_cpan = 1;
322 elsif ($arg eq '-show_all') {
325 elsif ($arg eq '-counts') {
330 Unknown option '$arg'
332 Usage: $0 [ --regen | --cpan | --show_all ] [ FILE ... ]\n"
333 --cpan -> Include files in the cpan subdirectory.
334 --regen -> Regenerate the data file for $0
335 --show_all -> Show all known potential problems
336 --counts -> Don't test, but give summary counts of the currently
344 if (($regen + $show_all + $show_counts + $do_upstream_cpan) > 1) {
345 croak "--regen, --show_all, --cpan, and --counts are mutually exclusive";
348 my $has_input_files = @files;
350 if ($has_input_files && ($regen || $show_counts || $do_upstream_cpan)) {
351 croak "--regen, --counts and --cpan can't be used since using specific files";
354 our %problems; # potential problems found in this run
356 package My::Pod::Checker { # Extend Pod::Checker
357 use parent 'Pod::Checker';
359 # Uses inside out hash to protect from typos
360 # For new fields, remember to add to destructor DESTROY()
361 my %indents; # Stack of indents from =over's in effect for
363 my %current_indent; # Current line's indent
364 my %filename; # The pod is store in this file
365 my %skip; # is SKIP set for this pod
366 my %in_NAME; # true if within NAME section
367 my %in_begin; # true if within =begin section
368 my %linkable_item; # Bool: if the latest =item is linkable. It isn't
369 # for bullet and number lists
370 my %linkable_nodes; # Pod::Checker adds all =items to its node list,
371 # but not all =items are linkable to
372 my %seen_encoding_cmd; # true if have =encoding earlier
373 my %command_count; # Number of commands seen
374 my %seen_pod_cmd; # true if have =pod earlier
375 my %warned_encoding; # true if already have warned about =encoding
379 my $addr = Scalar::Util::refaddr $_[0];
380 delete $command_count{$addr};
381 delete $current_indent{$addr};
382 delete $filename{$addr};
383 delete $in_begin{$addr};
384 delete $indents{$addr};
385 delete $in_NAME{$addr};
386 delete $linkable_item{$addr};
387 delete $linkable_nodes{$addr};
388 delete $seen_encoding_cmd{$addr};
389 delete $seen_pod_cmd{$addr};
391 delete $warned_encoding{$addr};
397 my $filename = shift;
399 my $self = $class->SUPER::new(-quiet => 1,
400 -warnings => $Warnings_Level);
401 my $addr = Scalar::Util::refaddr $self;
402 $command_count{$addr} = 0;
403 $current_indent{$addr} = 0;
404 $filename{$addr} = $filename;
405 $in_begin{$addr} = 0;
407 $linkable_item{$addr} = 0;
408 $seen_encoding_cmd{$addr} = 0;
409 $seen_pod_cmd{$addr} = 0;
410 $warned_encoding{$addr} = 0;
414 # re's for messages that Pod::Checker outputs
415 my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
416 my $optional_location = qr/ (?: $location )? /xi;
417 my $line_reference = qr/ [('"]? $optional_location \b line \s+
418 (?: \d+ | EOF | \Q???\E | - )
421 sub poderror { # Called to register a potential problem
423 # This adds an extra field to the parent hash, 'parameter'. It is
424 # used to extract the variable parts of a message leaving just the
425 # constant skeleton. This in turn allows the message to be
426 # categorized better, so that it shows up as a single type in our
427 # database, with the specifics of each occurrence not being stored with
433 my $addr = Scalar::Util::refaddr $self;
434 return if $skip{$addr};
436 # Input can be a string or hash. If a string, parse it to separate
437 # out the line number and convert to a hash for easier further
440 if (ref $opts ne 'HASH') {
441 $message = join "", $opts, @_;
443 if ($message =~ s/\s*($line_reference)//) {
444 ($line_number = $1) =~ s/\s*$optional_location//;
447 $line_number = '???';
449 $opts = { -msg => $message, -line => $line_number };
451 $message = $opts->{'-msg'};
455 $message =~ s/^\d+\s+//;
456 return if main::suppressed($message);
458 $self->SUPER::poderror($opts, @_);
460 $opts->{parameter} = "" unless $opts->{parameter};
462 # The variable parts of the message tend to be enclosed in '...',
463 # "....", or (...). Extract them and put them in an extra field,
464 # 'parameter'. This is trickier because the matching delimiter to a
465 # '(' is its mirror, and not itself. Text::Balanced could be used
467 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
470 $delimiter = ')' if $delimiter eq '(';
472 # If there is no ending delimiter, don't consider it to be a
473 # variable part. Most likely it is a contraction like "Don't"
474 last unless $message =~ m/\G .+? \Q$delimiter/xg;
476 my $length = $+[0] - $start;
478 # Get the part up through the closing delimiter
479 my $special = substr($message, $start, $length);
480 $special =~ s/^\s+//; # No leading whitespace
482 # And add that variable part to the parameter, while removing it
483 # from the message. This isn't a foolproof way of finding the
484 # variable part. For example '(s)' can occur in e.g.,
486 if ($special ne '(s)') {
487 substr($message, $start, $length) = "";
488 pos $message = $start;
489 $opts->{-msg} = $message;
490 $opts->{parameter} .= " " if $opts->{parameter};
491 $opts->{parameter} .= $special;
495 # Extract any additional line number given. This is often the
496 # beginning location of something whereas the main line number gives
498 if ($message =~ /( $line_reference )/xi) {
500 while ($message =~ s/\s*\Q$line_ref//) {
501 $opts->{-msg} = $message;
502 $opts->{parameter} .= " " if $opts->{parameter};
503 $opts->{parameter} .= $line_ref;
507 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
508 push @{$problems{$filename{$addr}}{$message}}, $opts;
509 #push @{$problems{$self->get_filename}{$message}}, $opts;
512 sub check_encoding { # Does it need an =encoding statement?
513 my ($self, $paragraph, $line_num, $pod_para) = @_;
515 # Do nothing if there is an =encoding in the file, or if the line
516 # doesn't require an =encoding, or have already warned.
517 my $addr = Scalar::Util::refaddr $self;
518 return if $seen_encoding_cmd{$addr}
519 || $warned_encoding{$addr}
520 || $paragraph !~ /\P{ASCII}/;
522 $warned_encoding{$addr} = 1;
523 my ($file, $line) = $pod_para->file_line;
524 $self->poderror({ -line => $line, -file => $file,
525 -msg => $need_encoding
531 my ($self, $paragraph, $line_num, $pod_para) = @_;
532 $self->check_encoding($paragraph, $line_num, $pod_para);
534 $self->SUPER::verbatim($paragraph, $line_num, $pod_para);
536 # Pick up the name, since the parent class doesn't in verbatim
537 # NAMEs; so treat as non-verbatim. The parent class only allows one
538 # paragraph in a NAME section, so if there is an extra blank line, it
539 # will trigger a message, but such a blank line is harmless, so skip
541 if ($in_NAME{Scalar::Util::refaddr $self} && $paragraph =~ /\S/) {
542 $self->textblock($paragraph, $line_num, $pod_para);
545 my @lines = split /^/, $paragraph;
546 for my $i (0 .. @lines - 1) {
547 $lines[$i] =~ s/\s+$//;
548 my $indent = $self->get_current_indent;
549 my $exceeds = length(Text::Tabs::expand($lines[$i]))
550 + $indent - $MAX_LINE_LENGTH;
551 next unless $exceeds > 0;
552 my ($file, $line) = $pod_para->file_line;
553 $self->poderror({ -line => $line + $i, -file => $file,
554 -msg => $line_length,
555 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
561 my ($self, $paragraph, $line_num, $pod_para) = @_;
562 $self->check_encoding($paragraph, $line_num, $pod_para);
564 $self->SUPER::textblock($paragraph, $line_num, $pod_para);
566 my ($file, $line) = $pod_para->file_line;
567 my $addr = Scalar::Util::refaddr $self;
568 if ($in_NAME{$addr}) {
570 my $text = $self->interpolate($paragraph, $line_num);
571 if ($text =~ /^\s*(\S+?)\s*$/) {
573 $self->poderror({ -line => $line, -file => $file,
574 -msg => $missing_name_description,
579 $paragraph = join " ", split /^/, $paragraph;
581 # Matches something that looks like a file name, but is enclosed in
583 my $C_path_re = qr{ \b ( C<
584 # exclude regexes and 'OS/2'
585 (?! (?: (?: s | qr | m) / ) | OS/2 > )
586 \w+ (?: / \w+ )+ > (?: \. \w+ )? )
589 # If looks like a reference to other documentation by containing the
590 # word 'See' and then a likely pod directive, warn.
592 while ($paragraph =~ m{ \b See \s+ ( ( [^L] ) <
593 ( [^<]*? ) # The not-< excludes nested C<L<...
598 if ($interior !~ /$non_pods/
599 && $construct !~ /$C_path_re/g) {
600 $self->poderror({ -line => $line, -file => $file,
601 -msg => $see_not_linked,
602 parameter => $construct
606 while ($paragraph =~ m/$C_path_re/g) {
608 $self->poderror({ -line => $line, -file => $file,
609 -msg => $C_with_slash,
610 parameter => $construct
617 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
618 my $addr = Scalar::Util::refaddr $self;
620 $seen_pod_cmd{$addr}++;
622 elsif ($cmd eq "encoding") {
623 my ($file, $line) = $pod_para->file_line;
624 $seen_encoding_cmd{$addr} = 1;
625 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) {
626 $self->poderror({ -line => $line, -file => $file,
627 -msg => $encoding_first
631 $self->check_encoding($paragraph, $line_num, $pod_para);
633 # Pod::Check treats all =items as linkable, but the bullet and
634 # numbered lists really aren't. So keep our own list. This has to be
635 # processed before SUPER is called so that the list is started before
636 # the rest of it gets parsed.
637 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit
638 $linkable_item{$addr} = ($paragraph !~ / ^ \s*
640 | \d+ \.? (?: \$ | \s+ )
646 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para);
648 $command_count{$addr}++;
650 $in_NAME{$addr} = 0; # Will change to 1 below if necessary
651 $in_begin{$addr} = 0; # ibid
652 if ($cmd eq 'over') {
653 my $text = $self->interpolate($paragraph, $line_num);
654 my $indent = 4; # default
655 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/;
656 push @{$indents{$addr}}, $indent;
657 $current_indent{$addr} += $indent;
659 elsif ($cmd eq 'back') {
660 if (@{$indents{$addr}}) {
661 $current_indent{$addr} -= pop @{$indents{$addr}};
664 # =back without corresponding =over, but should have
666 $current_indent{$addr} = 0;
669 elsif ($cmd =~ /^head/) {
670 if (! $in_begin{$addr}) {
672 # If a particular formatter, then this command doesn't really
674 $current_indent{$addr} = 0;
675 undef @{$indents{$addr}};
678 my $text = $self->interpolate($paragraph, $line_num);
679 $in_NAME{$addr} = 1 if $cmd eq 'head1'
680 && $text && $text =~ /^NAME\b/;
682 elsif ($cmd eq 'begin') {
683 $in_begin{$addr} = 1;
692 # If the hyperlink is to an interior node of another page, save it
693 # so that we can see if we need to parse normally skipped files.
694 $has_referred_to_node{$_[0][1]{'-page'}} = 1
695 if $_[0] && $_[0][1]{'-page'} && $_[0][1]{'-node'};
696 return $self->SUPER::hyperlink($_[0]);
703 $text =~ s/\s+$//s; # strip trailing whitespace
704 $text =~ s/\s+/ /gs; # collapse whitespace
705 my $addr = Scalar::Util::refaddr $self;
706 push(@{$linkable_nodes{$addr}}, $text) if
707 ! $current_indent{$addr}
708 || $linkable_item{$addr};
710 return $self->SUPER::node($_[0]);
713 sub get_current_indent {
714 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]};
718 return $filename{Scalar::Util::refaddr $_[0]};
722 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]};
723 return undef unless $linkables;
728 return $skip{Scalar::Util::refaddr $_[0]} // 0;
733 $skip{Scalar::Util::refaddr $self} = shift;
735 # If skipping, no need to keep the problems for it
736 delete $problems{$self->get_filename};
741 package Tie_Array_to_FH { # So printing actually goes to an array
747 my $array_ref = shift;
749 my $self = bless \do{ my $anonymous_scalar }, $class;
750 $array{Scalar::Util::refaddr $self} = $array_ref;
757 push @{$array{Scalar::Util::refaddr $self}}, @_;
763 my %filename_to_checker; # Map a filename to it's pod checker object
764 my %id_to_checker; # Map a checksum to it's pod checker object
765 my %nodes; # key is filename, values are nodes in that file.
766 my %nodes_first_word; # same, but value is first word of each node
767 my %valid_modules; # List of modules known to exist outside us.
768 my %digests; # checksums of files, whose names are the keys
769 my %filename_to_pod; # Map a filename to its pod NAME
770 my %files_with_unknown_issues;
771 my %files_with_fixes;
774 open($data_fh, $known_issues) || die "Can't open $known_issues";
776 my %counts; # For --counts param, count of each issue type
777 my %suppressed_files; # Files with at least one issue type to suppress
779 while (<$data_fh>) { # Read the data base
781 next if /^\s*(?:#|$)/; # Skip comment and empty lines
785 # Keep track of counts of each issue type for each file
786 my ($filename, $message, $count) = split /\t/;
787 $known_problems{$filename}{$message} = $count;
790 if ($count < 0) { # -1 means to suppress this issue type
791 $suppressed_files{$filename} = $filename;
794 $counts{$message} += $count;
798 else { # Lines without a tab are modules known to be valid
799 $valid_modules{$_} = 1
806 foreach my $message (sort keys %counts) {
807 $total += $counts{$message};
808 note(Text::Tabs::expand("$counts{$message}\t$message"));
810 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
811 if (%suppressed_files) {
812 note("\nFiles that have all messages of at least one type suppressed:");
813 note(join ",", keys %suppressed_files);
819 my %excluded_files = (
820 "lib/unicore/mktables" => 1,
821 "Porting/perldelta_template.pod" => 1,
827 # It would be nice if we didn't have to skip this,
828 # but the errors in it are too variable.
829 "pod/perltoc.pod" => 1,
832 # Convert to more generic form.
833 foreach my $file (keys %excluded_files) {
834 $excluded_files{canonicalize($excluded_files{$file})}
835 = $excluded_files{$file};
838 # re to match files that are to be parsed only if there is an internal link
839 # to them. It does not include cpan, as whether those are parsed depends
840 # on a switch. Currently, only the stable perldelta.pod's are included.
841 # These all have characters between 'perl' and 'delta'. (Actually the
842 # currently developed one matches as well, but is a duplicate of
843 # perldelta.pod, so can be skipped, so fine for it to match this.
844 my $only_for_interior_links_re = qr/ \b perl \d+ delta \. pod \b/x;
849 sub output_thanks ($$$$) { # Called when an issue has been fixed
850 my $filename = shift;
851 my $original_count = shift;
852 my $current_count = shift;
855 $files_with_fixes{$filename} = 1;
857 my $fixed_count = $original_count - $current_count;
858 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
859 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
863 There were $original_count occurrences (now $current_count) in this pod of type
868 There are no longer any problems found in this pod!
875 Thanks for fixing $a_problem!
877 Now you must teach $0 that this was fixed.
882 Thanks for fixing $another_problem.
891 sub my_safer_print { # print, with error checking for outputting to db
892 my ($fh, @lines) = @_;
894 if (! print $fh @lines) {
897 die "Write failure: $save_error";
901 sub extract_pod { # Extracts just the pod from a file
902 my $filename = shift;
906 # Arrange for the output of Pod::Parser to be collected in an array we can
907 # look at instead of being printed
908 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod;
909 open my $in_fh, '<', $filename
910 or die "Can't open '$filename': $!\n";
912 my $parser = Pod::Parser->new();
913 $parser->parse_from_filehandle($in_fh, *ALREADY_FH);
919 my $digest = Digest->new($digest_type);
923 # Don't look at files in directories that are for tests, nor those
924 # beginning with a dot
925 if ($_ eq 't' || $_ =~ /^\../) {
926 $File::Find::prune = 1;
931 return if $_ =~ /^\./; # No hidden Unix files
932 return if $_ =~ $non_pods;
934 my $filename = $File::Find::name;
936 # Assumes that the path separator is exactly one character.
937 $filename =~ s/^\..//;
939 return if $excluded_files{canonicalize($filename)};
941 open my $candidate, '<', $_
942 or die "Can't open '$File::Find::name': $!\n";
943 my @contents = <$candidate>;
946 # If the file is a .pm or .pod, having any initial '=' on a line is
947 # grounds for testing it. Otherwise, require a head1 NAME line to view it
951 for ($i = 0; $i < @contents; $i++) {
952 next unless $contents[$i] =~ /^=/;
953 if ($filename =~ /\.(?:pm|pod)/) {
954 $found = 'found_some_pod_line';
957 elsif ($contents[$i] =~ /^=head1 +NAME/) {
958 $found = 'found_NAME';
963 # Here, we know that the file is a pod. Add it to the list of files
964 # to check and create a checker object for it.
966 push @files, $filename;
967 my $checker = My::Pod::Checker->new($filename);
968 $filename_to_checker{$filename} = $checker;
970 # In order to detect duplicate pods and only analyze them once, we
971 # compute checksums for the file, so don't have to do an exact
972 # compare. Note that if the pod is just part of the file, the
973 # checksums can differ for the same pod. That special case is handled
974 # later, since if the checksums of the whole file are the same, that
975 # case won't even come up. We don't need the checksums for files that
976 # we parse only if there is a link to its interior, but we do need its
977 # NAME, which is also retrieved in the code below.
978 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
979 | $only_for_interior_links_re
981 $digest->add(@contents);
982 $digests{$filename} = $digest->digest;
984 # lib files aren't analyzed if they are duplicates of files copied
985 # there from some other directory. But to determine this, we need
986 # to know their NAMEs. We might as well find the NAME now while
987 # the file is open. Similarly, cpan files aren't analyzed unless
988 # we're analyzing all of them, or this particular file is linked
989 # to by a file we are analyzing, and thus we will want to verify
990 # that the target exists in it. We need to know at least the NAME
991 # to see if it's worth analyzing, or so we can determine if a lib
992 # file is a copy of a cpan one.
993 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
994 | $only_for_interior_links_re
996 if ($found eq 'found_some_pod_line') {
997 for (; $i < @contents; $i++) {
998 next if $contents[$i] !~ /^=head1/;
999 $found = 'found_NAME'
1000 if $contents[$i] =~ /^=head1 +NAME/;
1004 if ($found eq 'found_NAME') {
1005 $i++; # The NAME starts on a later line
1008 while ($contents[$i] !~ /\S/) { $i++ }
1010 # The NAME is the first non-spaces on the line up to a
1011 # comma, dash or end of line. Otherwise, it's invalid and
1012 # this pod doesn't have a legal name that we're smart
1013 # enough to find currently. But the parser will later
1014 # find it if it thinks there is a legal name, and set the
1016 if ($contents[$i] =~ /^ \s* ( \S+?) \s* (?: [,-] | $ )/x) {
1018 $checker->name($name);
1019 $id_to_checker{$name} = $checker
1020 if $filename =~ m{^cpan/};
1023 elsif ($filename =~ m{^cpan/}) {
1024 $id_to_checker{$digests{$filename}} = $checker;
1029 } # End of is_pod_file()
1031 # Start of real code that isn't processing the command line.
1032 # Here, @files contains list of files on the command line. If have any of
1033 # these, unconditionally test them, and show all the errors, even the known
1034 # ones, and, since not testing other pods, don't do cross-pod link tests.
1035 # (Could add extra code to do cross-pod tests for the ones in the list.)
1036 if ($has_input_files) {
1037 undef %known_problems;
1038 $do_upstream_cpan = 1; # In case one of the inputs is from cpan
1041 else { # No input files -- go find all the possibilities.
1043 $copy_fh = open_new($known_issues);
1044 note("Regenerating $known_issues, please be patient...");
1045 print $copy_fh <<END;
1046 # This file is the data file for $0.
1047 # There are three types of lines.
1048 # Comment lines are white-space only or begin with a '#', like this one. Any
1049 # changes you make to the comment lines will be lost when the file is
1051 # Lines without tab characters are simply NAMES of pods that the program knows
1052 # will have links to them and the program does not check if those links are
1054 # All other lines should have three fields, each separated by a tab. The
1055 # first field is the name of a pod; the second field is an error message
1056 # generated by this program; and the third field is a count of how many
1057 # known instances of that message there are in the pod. -1 means that the
1058 # program can expect any number of this type of message.
1062 # Move to the directory above us, but have to adjust @INC to account for
1064 s{^\.\./lib$}{lib} for @INC;
1065 chdir File::Spec->updir;
1067 # And look in this directory and all its subdirectories
1068 find( \&is_pod_file, '.');
1070 # Add ourselves to the test
1071 push @files, "t/porting/podcheck.t";
1074 # Now we know how many tests there will be.
1075 plan (tests => scalar @files) if ! $regen;
1078 # Sort file names so we get consistent results, and to put cpan last,
1079 # preceeded by the ones that we don't generally parse. This is because both
1080 # these classes are generally parsed only if there is a link to the interior
1081 # of them, and we have to parse all others first to guarantee that they don't
1082 # have such a link. 'lib' files come just before these, as some of these are
1083 # duplicates of others. We already have figured this out when gathering the
1084 # data as a special case for all such files, but this, while unnecessary,
1085 # puts the derived file last in the output. 'readme' files come before those,
1086 # as those also could be duplicates of others, which are considered the
1087 # primary ones. These currently aren't figured out when gathering data, so
1089 @files = sort { if ($a =~ /^cpan/) {
1090 return 1 if $b !~ /^cpan/;
1093 elsif ($b =~ /^cpan/) {
1096 elsif ($a =~ /$only_for_interior_links_re/) {
1097 return 1 if $b !~ /$only_for_interior_links_re/;
1100 elsif ($b =~ /$only_for_interior_links_re/) {
1103 elsif ($a =~ /^lib/) {
1104 return 1 if $b !~ /^lib/;
1107 elsif ($b =~ /^lib/) {
1109 } elsif ($a =~ /\breadme\b/i) {
1110 return 1 if $b !~ /\breadme\b/i;
1113 elsif ($b =~ /\breadme\b/i) {
1117 return lc $a cmp lc $b;
1122 # Now go through all the files and parse them
1123 foreach my $filename (@files) {
1125 note("parsing $filename") if DEBUG;
1127 # We may have already figured out some things in the process of generating
1128 # the file list. If so, have a $checker object already. But if not,
1130 my $checker = $filename_to_checker{$filename};
1132 $checker = My::Pod::Checker->new($filename);
1133 $filename_to_checker{$filename} = $checker;
1136 # We have set the name in the checker object if there is a possibility
1137 # that no further parsing is necessary, but otherwise do the parsing now.
1138 if (! $checker->name) {
1140 $checker->parse_from_file($filename, undef);
1143 if ($checker->num_errors() < 0) { # Returns negative if not a pod
1144 $checker->set_skip("$filename is not a pod");
1148 # Here, is a pod. See if it is one that has already been tested,
1149 # or should be tested under another directory. Use either its NAME
1150 # if it has one, or a checksum if not.
1151 my $name = $checker->name;
1158 my $digest = Digest->new($digest_type);
1159 $digest->add(extract_pod($filename));
1160 $id = $digest->digest;
1163 # If there is a match for this pod with something that we've already
1164 # processed, don't process it, and output why.
1166 if (defined ($prior_checker = $id_to_checker{$id})
1167 && $prior_checker != $checker) # Could have defined the checker
1168 # earlier without pursuing it
1171 # If the pods are identical, then it's just a copy, and isn't an
1172 # error. First use the checksums we have already computed to see
1173 # if the entire files are identical, which means that the pods are
1175 my $prior_filename = $prior_checker->get_filename;
1177 || ($digests{$prior_filename}
1178 && $digests{$filename}
1179 && $digests{$prior_filename} eq $digests{$filename}));
1181 # If they differ, it could be that the files differ for some
1182 # reason, but the pods they contain are identical. Extract the
1183 # pods and do the comparisons on just those.
1184 if (! $same && $name) {
1185 $same = extract_pod($prior_filename) eq extract_pod($filename);
1189 $checker->set_skip("The pod of $filename is a duplicate of "
1190 . "the pod for $prior_filename");
1191 } elsif ($prior_filename =~ /\breadme\b/i) {
1192 $checker->set_skip("$prior_filename is a README apparently for $filename");
1193 } elsif ($filename =~ /\breadme\b/i) {
1194 $checker->set_skip("$filename is a README apparently for $prior_filename");
1195 } else { # Here have two pods with identical names that differ
1196 $prior_checker->poderror(
1197 { -msg => $duplicate_name,
1199 parameter => "'$filename' also has NAME '$name'"
1202 { -msg => $duplicate_name,
1204 parameter => "'$prior_filename' also has NAME '$name'"
1207 # Changing the names helps later.
1208 $prior_checker->name("$name version arbitrarily numbered 1");
1209 $checker->name("$name version arbitrarily numbered 2");
1212 # In any event, don't process this pod that has the same name as
1218 $id_to_checker{$id} = $checker;
1220 my $parsed_for_links = ", but parsed for its interior links";
1221 if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1222 || $filename =~ $only_for_interior_links_re)
1224 if ($filename =~ /^cpan/) {
1225 $checker->set_skip("CPAN is upstream for $filename");
1227 elsif ($filename =~ /perl\d+delta/) {
1228 $checker->set_skip("$filename is a stable perldelta");
1231 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1234 if ($name && $has_referred_to_node{$name}) {
1235 $checker->set_skip($checker->get_skip() . $parsed_for_links);
1239 # Need a name in order to process it, because not meaningful
1240 # otherwise, and also can't test links to this without a name.
1241 if (!defined $name) {
1242 $checker->poderror( { -msg => $no_name,
1248 # For skipped files, just get its NAME
1250 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1252 $checker->node($name) if $name;
1255 $checker->parse_from_file($filename, undef) if ! $parsed;
1258 # Go through everything in the file that could be an anchor that
1259 # could be a link target. Count how many there are of the same name.
1260 foreach my $node ($checker->linkable_nodes) {
1261 next if ! $node; # Can be empty is like '=item *'
1262 if (exists $nodes{$name}{$node}) {
1263 $nodes{$name}{$node}++;
1266 $nodes{$name}{$node} = 1;
1269 # Experiments have shown that cpan search can figure out the
1270 # target of a link even if the exact wording is incorrect, as long
1271 # as the first word is. This happens frequently in perlfunc.pod,
1272 # where the link will be just to the function, but the target
1273 # entry also includes parameters to the function.
1274 my $first_word = $node;
1275 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1276 $nodes_first_word{$name}{$first_word} = $node;
1279 $filename_to_pod{$filename} = $name;
1283 # Here, all files have been parsed, and all links and link targets are stored.
1284 # Now go through the files again and see which don't have matches.
1285 if (! $has_input_files) {
1286 foreach my $filename (@files) {
1287 next if $filename_to_checker{$filename}->get_skip;
1288 my $checker = $filename_to_checker{$filename};
1289 foreach my $link ($checker->hyperlink) {
1290 my $linked_to_page = $link->[1]->page;
1291 next unless $linked_to_page; # intra-file checks are handled by std
1294 # Initialize the potential message.
1295 my %problem = ( -msg => $broken_link,
1296 -line => $link->[0],
1297 parameter => "to \"$linked_to_page\"",
1300 # See if we have found the linked-to_file in our parse
1301 if (exists $nodes{$linked_to_page}) {
1302 my $node = $link->[1]->node;
1304 # If link is only to the page-level, already have it
1307 # Transform pod language to what we are expecting
1308 $node =~ s,E<sol>,/,g;
1309 $node =~ s/E<verbar>/|/g;
1311 # If link is to a node that exists in the file, is ok
1312 if ($nodes{$linked_to_page}{$node}) {
1314 # But if the page has multiple targets with the same name,
1315 # it's ambiguous which one this should be to.
1316 if ($nodes{$linked_to_page}{$node} > 1) {
1317 $problem{-msg} = $multiple_targets;
1318 $problem{parameter} = "in $linked_to_page that $node could be pointing to";
1319 $checker->poderror(\%problem);
1321 } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
1323 # Here the link target was not found, either exactly or to
1324 # the first word. Is an error.
1325 $problem{parameter} =~ s,"$,/$node",;
1326 $checker->poderror(\%problem);
1329 } # Linked-to-file not in parse; maybe is in exception list
1330 elsif (! exists $valid_modules{$link->[1]->page}) {
1332 # Here, is a link to a target that we can't find. Check if
1333 # there is an internal link on the page with the target name.
1334 # If so, it could be that they just forgot the initial '/'
1335 if ($filename_to_pod{$filename}
1336 && $nodes{$filename_to_pod{$filename}}{$linked_to_page})
1338 $problem{-msg} = $broken_internal_link;
1340 $checker->poderror(\%problem);
1346 # If regenerating the data file, start with the modules for which we don't
1349 foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1350 my_safer_print($copy_fh, $_, "\n");
1354 # Now ready to output the messages.
1355 foreach my $filename (@files) {
1356 my $test_name = "POD of $filename";
1357 my $canonical = canonicalize($filename);
1359 my $skip = $filename_to_checker{$filename}->get_skip // "";
1362 foreach my $message ( sort keys %{$problems{$filename}}) {
1365 # Preserve a negative setting.
1366 if ($known_problems{$canonical}{$message}
1367 && $known_problems{$canonical}{$message} < 0)
1369 $count = $known_problems{$canonical}{$message};
1372 $count = @{$problems{$filename}{$message}};
1374 my_safer_print($copy_fh, canonicalize($filename) . "\t$message\t$count\n");
1379 skip($skip, 1) if $skip;
1383 my $total_known = 0;
1384 foreach my $message ( sort keys %{$problems{$filename}}) {
1385 $known_problems{$canonical}{$message} = 0
1386 if ! $known_problems{$canonical}{$message};
1387 my $diagnostic = "";
1388 my $problem_count = scalar @{$problems{$filename}{$message}};
1389 $total_known += $problem_count;
1390 next if $known_problems{$canonical}{$message} < 0;
1391 if ($problem_count > $known_problems{$canonical}{$message}) {
1393 # Here we are about to output all the messages for this type,
1394 # subtract back this number we previously added in.
1395 $total_known -= $problem_count;
1397 $diagnostic .= $indent . $message;
1398 if ($problem_count > 2) {
1399 $diagnostic .= " ($problem_count occurrences)";
1401 foreach my $problem (@{$problems{$filename}{$message}}) {
1402 $diagnostic .= " " if $problem_count == 1;
1403 $diagnostic .= "\n$indent$indent";
1404 $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
1405 $diagnostic .= " near line $problem->{-line}";
1406 $diagnostic .= " $problem->{comment}" if $problem->{comment};
1408 $diagnostic .= "\n";
1409 $files_with_unknown_issues{$filename} = 1;
1410 } elsif ($problem_count < $known_problems{$canonical}{$message}) {
1411 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
1413 push @diagnostics, $diagnostic if $diagnostic;
1416 # The above loop has output messages where there are current potential
1417 # issues. But it misses where there were some that have been entirely
1418 # fixed. For those, we need to look through the old issues
1419 foreach my $message ( sort keys %{$known_problems{$canonical}}) {
1420 next if $problems{$filename}{$message};
1421 next if ! $known_problems{$canonical}{$message};
1422 next if $known_problems{$canonical}{$message} < 0; # Preserve negs
1423 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
1424 push @diagnostics, $diagnostic if $diagnostic;
1427 my $output = "POD of $filename";
1428 $output .= ", excluding $total_known not shown known potential problems"
1430 ok(@diagnostics == 0, $output);
1432 note(join "", @diagnostics,
1433 "See end of this test output for your options");
1439 run this test script by hand, using the following formula (on
1440 Un*x-like machines):
1442 ./perl -I../lib porting/podcheck.t --regen
1445 if (%files_with_unknown_issues) {
1446 my $were_count_files = scalar keys %files_with_unknown_issues;
1447 $were_count_files = ($were_count_files == 1)
1448 ? "was $were_count_files file"
1449 : "were $were_count_files files";
1450 my $message = <<EOF;
1452 HOW TO GET THIS .t TO PASS
1454 There $were_count_files that had new potential problems identified. To get
1455 this .t to pass, do the following:
1457 1) If a problem is about a link to an unknown module that you know exists,
1458 simply edit the file,
1460 and add anywhere a line that contains just the module's name.
1461 (Don't do this for a module that you aren't sure about; instead treat
1462 as another type of issue and follow the instructions below.)
1464 2) For other issues, decide if each should be fixed now or not. Fix the
1465 ones you decided to, and rerun this test to verify that the fixes
1468 3) If there remain potential problems that you don't plan to fix right
1469 now (or aren't really problems),
1471 That should cause all current potential problems to be accepted by the
1472 program, so that the next time it runs, they won't be flagged.
1474 if (%files_with_fixes) {
1475 $message .= " This step will also take care of the files that have fixes in them\n";
1479 For a few files, such as perltoc, certain issues will always be
1480 expected, and more of the same will be added over time. For those,
1481 before you do the regen, you can edit
1483 and find the entry for the module's file and specific error message,
1484 and change the count of known potential problems to -1.
1488 } elsif (%files_with_fixes) {
1490 To teach this test script that the potential problems have been fixed,
1497 chdir $original_dir || die "Can't change directories to $original_dir";
1498 close_and_rename($copy_fh);