3 # A tool for analysing the performance of the code snippets found in
4 # t/perf/benchmarks or similar
9 bench.pl - Compare the performance of perl code snippets across multiple
14 # Basic: run the tests in t/perf/benchmarks against two or
17 bench.pl [options] perl1[=label1] perl2[=label2] ...
19 # Run bench.pl's own built-in sanity tests
21 bench.pl --action=selftest
25 By default, F<bench.pl> will run code snippets found in
26 F<t/perf/benchmarks> (or similar) under cachegrind, in order to calculate
27 how many instruction reads, data writes, branches, cache misses, etc. that
28 one execution of the snippet uses. It will run them against two or more
29 perl executables and show how much each test has gotten better or worse.
31 It is modelled on the F<perlbench> tool, but since it measures instruction
32 reads etc., rather than timings, it is much more precise and reproducible.
33 It is also considerably faster, and is capable or running tests in
34 parallel (with C<-j>). Rather than displaying a single relative
35 percentage per test/perl combination, it displays values for 13 different
36 measurements, such as instruction reads, conditional branch misses etc.
38 There are options to write the raw data to a file, and to read it back.
39 This means that you can view the same run data in different views with
40 different selection and sort options.
42 The optional C<=label> after each perl executable is used in the display
53 What action to perform. The default is I<grind>, which runs the benchmarks
54 using I<cachegrind> as the back end. The only other action at the moment is
55 I<selftest>, which runs some basic sanity checks and produces TAP output.
61 Only display the overall average, rather than the results for each
68 The path of the file which contains the benchmarks (F<t/perf/benchmarks>
73 --bisect=I<field,minval,maxval>
75 Run a single test against one perl and exit with a zero status if the
76 named field is in the specified range; exit 1 otherwise. It will complain
77 if more than one test or perl has been specified. It is intended to be
78 called as part of a bisect run, to determine when something changed.
81 bench.pl -j 8 --tests=foo --bisect=Ir,100,105 --perlargs=-Ilib \
84 might be called from bisect to find when the number of instruction reads
85 for test I<foo> falls outside the range 100..105.
91 Enable verbose debugging output.
97 Display only the specified fields; for example,
99 --fields=Ir,Ir_m,Ir_mm
101 If only one field is selected, the output is in more compact form.
107 Optional command-line arguments to pass to cachegrind invocations.
113 Display basic usage information.
120 Run I<N> jobs in parallel (default 1). This determines how many cachegrind
121 process will running at a time, and should generally be set to the number
128 Specify which perl column in the output to treat as the 100% norm.
129 It may be a column number (0..N-1) or a perl executable name or label.
130 It defaults to the leftmost column.
136 Optional command-line arguments to pass to each perl that is run as part of
137 a cachegrind session. For example, C<--perlargs=-Ilib>.
143 Display raw data counts rather than percentages in the outputs. This
144 allows you to see the exact number of intruction reads, branch misses etc.
145 for each test/perl combination. It also causes the C<AVERAGE> display
146 per field to be calculated based on the average of each tests's count
147 rather than average of each percentage. This means that tests with very
148 high counts will dominate.
154 Order the tests in the output based on the value of I<field> in the
155 column I<perl>. The I<perl> value is as per C<--norm>. For example
157 bench.pl --sort=Dw:perl-5.20.0 \
158 perl-5.16.0 perl-5.18.0 perl-5.20.0
165 Read in saved data from a previous C<--write> run from the specified file.
167 Requires C<JSON::PP> to be available.
173 Specify a subset of tests to run (or in the case of C<--read>, to display).
174 It may be either a comma-separated list of test names, or a regular
175 expression. For example
177 --tests=expr::assign::scalar_lex,expr::assign::2list_lex
184 Display progress information.
191 Save the raw data to the specified file. It can be read back later with
194 Requires C<JSON::PP> to be available.
205 use Getopt::Long qw(:config no_auto_abbrev);
209 use POSIX ":sys_wait_h";
211 # The version of the file format used to save data. We refuse to process
212 # the file if the integer component differs.
214 my $FORMAT_VERSION = 1.0;
216 # The fields we know about
218 my %VALID_FIELDS = map { $_ => 1 }
219 qw(Ir Ir_m1 Ir_mm Dr Dr_m1 Dr_mm Dw Dw_m1 Dw_mm COND COND_m IND IND_m);
223 usage: $0 [options] perl[=label] ...
224 --action=foo What action to perform [default: grind].
225 --average Only display average, not individual test results.
226 --benchfile=foo File containing the benchmarks;
227 [default: t/perf/benchmarks].
228 --bisect=f,min,max run a single test against one perl and exit with a
229 zero status if the named field is in the specified
230 range; exit 1 otherwise.
231 --debug Enable verbose debugging output.
232 --fields=a,b,c Display only the specified fields (e.g. Ir,Ir_m,Ir_mm).
233 --grindargs=foo Optional command-line args to pass to cachegrind.
234 --help Display this help.
235 -j|--jobs=N Run N jobs in parallel [default 1].
236 --norm=perl Which perl column to treat as 100%; may be a column
237 number (0..N-1) or a perl executable name or label;
239 --perlargs=foo Optional command-line args to pass to each perl to run.
240 --raw Display raw data counts rather than percentages.
241 --sort=field:perl Sort the tests based on the value of 'field' in the
242 column 'perl'. The perl value is as per --norm.
243 -r|--read=file Read in previously saved data from the specified file.
244 --tests=FOO Select only the specified tests from the benchmarks file;
245 FOO may be either of the form 'foo,bar' or '/regex/';
246 [default: all tests].
247 --verbose Display progress information.
248 -w|--write=file Save the raw data to the specified file.
251 grind run the code under cachegrind
252 selftest perform a selftest; produce TAP output
254 The command line ends with one or more specified perl executables,
255 which will be searched for in the current \$PATH. Each binary name may
256 have an optional =LABEL appended, which will be used rather than the
257 executable name in output. E.g.
259 perl-5.20.1=PRE-BUGFIX perl-5.20.1-new=POST-BUGFIX
266 benchfile => 't/perf/benchmarks',
283 # process command-line args and call top-level action
287 'action=s' => \$OPTS{action},
288 'average' => \$OPTS{average},
289 'benchfile=s' => \$OPTS{benchfile},
290 'bisect=s' => \$OPTS{bisect},
291 'debug' => \$OPTS{debug},
292 'grindargs=s' => \$OPTS{grindargs},
293 'help' => \$OPTS{help},
294 'fields=s' => \$OPTS{fields},
295 'jobs|j=i' => \$OPTS{jobs},
296 'norm=s' => \$OPTS{norm},
297 'perlargs=s' => \$OPTS{perlargs},
298 'raw' => \$OPTS{raw},
299 'read|r=s' => \$OPTS{read},
300 'sort=s' => \$OPTS{sort},
301 'tests=s' => \$OPTS{tests},
302 'verbose' => \$OPTS{verbose},
303 'write|w=s' => \$OPTS{write},
306 usage if $OPTS{help};
309 if (defined $OPTS{read} and defined $OPTS{write}) {
310 die "Error: can't specify both --read and --write options\n";
313 if (defined $OPTS{read} or defined $OPTS{write}) {
314 # fail early if it's not present
318 if (defined $OPTS{fields}) {
319 my @f = split /,/, $OPTS{fields};
321 die "Error: --fields: unknown field '$_'\n"
322 unless $VALID_FIELDS{$_};
324 my %f = map { $_ => 1 } @f;
328 my %valid_actions = qw(grind 1 selftest 1);
329 unless ($valid_actions{$OPTS{action}}) {
330 die "Error: unrecognised action '$OPTS{action}'\n"
331 . "must be one of: " . join(', ', sort keys %valid_actions)."\n";
334 if (defined $OPTS{sort}) {
335 my @s = split /:/, $OPTS{sort};
337 die "Error: --sort argument should be of the form field:perl: "
340 my ($field, $perl) = @s;
341 die "Error: --sort: unknown field '$field\n"
342 unless $VALID_FIELDS{$field};
343 # the 'perl' value will be validated later, after we have processed
345 $OPTS{'sort-field'} = $field;
346 $OPTS{'sort-perl'} = $perl;
349 if ($OPTS{action} eq 'selftest') {
351 die "Error: no perl executables may be specified with --read\n"
354 elsif (defined $OPTS{bisect}) {
355 die "Error: exactly one perl executable must be specified for bisect\n"
357 die "Error: Can't specify both --bisect and --read\n"
358 if defined $OPTS{read};
359 die "Error: Can't specify both --bisect and --write\n"
360 if defined $OPTS{write};
362 elsif (defined $OPTS{read}) {
364 die "Error: no perl executables may be specified with --read\n"
369 die "Error: at least one perl executable must be specified\n";
373 unless (@ARGV >= 2) {
374 die "Error: at least two perl executables must be specified\n";
378 if ($OPTS{action} eq 'grind') {
381 elsif ($OPTS{action} eq 'selftest') {
388 # Given a hash ref keyed by test names, filter it by deleting unwanted
389 # tests, based on $OPTS{tests}.
394 my $opt = $OPTS{tests};
395 return unless defined $opt;
400 $opt =~ s{^/(.+)/$}{$1}
401 or die "Error: --tests regex must be of the form /.../\n";
403 delete $tests->{$_} unless /$opt/;
408 for (split /,/, $opt) {
409 die "Error: no such test found: '$_'\n" unless exists $tests->{$_};
413 delete $tests->{$_} unless exists $t{$_};
419 # Read in the test file, and filter out any tests excluded by $OPTS{tests}
421 sub read_tests_file {
426 die "Error: can't parse '$file': $@\n" if $@;
427 die "Error: can't read '$file': $!\n";
436 # Process the perl/column argument of options like --norm and --sort.
437 # Return the index of the matching perl.
440 my ($perl, $perls, $who) = @_;
442 if ($perl =~ /^[0-9]$/) {
443 die "Error: $who value $perl outside range 0.." . $#$perls . "\n"
444 unless $perl < @$perls;
448 my @perl = grep $perls->[$_][0] eq $perl
449 || $perls->[$_][1] eq $perl,
451 die "Error: $who: unrecognised perl '$perl'\n"
453 die "Error: $who: ambiguous perl '$perl'\n"
460 # Validate the list of perl=label on the command line.
461 # Return a list of [ exe, label ] pairs.
466 my ($perl, $label) = split /=/, $p, 2;
468 my $r = qx($perl -e 'print qq(ok\n)' 2>&1);
469 die "Error: unable to execute '$perl': $r" if $r ne "ok\n";
470 push @results, [ $perl, $label ];
477 # Return a string containing perl test code wrapped in a loop
478 # that runs $ARGV[0] times
481 my ($test, $desc, $setup, $code) = @_;
488 for my \$__loop__ (1..\$ARGV[0]) {
495 # Parse the output from cachegrind. Return a hash ref.
496 # See do_selftest() for examples of the output format.
498 sub parse_cachegrind {
499 my ($output, $id, $perl) = @_;
503 my @lines = split /\n/, $output;
505 unless (s/(==\d+==)|(--\d+--) //) {
506 die "Error: while executing $id:\n"
507 . "unexpected code or cachegrind output:\n$_\n";
509 if (/I refs:\s+([\d,]+)/) {
512 elsif (/I1 misses:\s+([\d,]+)/) {
515 elsif (/LLi misses:\s+([\d,]+)/) {
518 elsif (/D refs:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
519 @res{qw(Dr Dw)} = ($1,$2);
521 elsif (/D1 misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
522 @res{qw(Dr_m1 Dw_m1)} = ($1,$2);
524 elsif (/LLd misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
525 @res{qw(Dr_mm Dw_mm)} = ($1,$2);
527 elsif (/Branches:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
528 @res{qw(COND IND)} = ($1,$2);
530 elsif (/Mispredicts:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
531 @res{qw(COND_m IND_m)} = ($1,$2);
535 for my $field (keys %VALID_FIELDS) {
536 die "Error: can't parse '$field' field from cachegrind output:\n$output"
537 unless exists $res{$field};
538 $res{$field} =~ s/,//g;
545 # Handle the 'grind' action
548 my ($perl_args) = @_; # the residue of @ARGV after option processing
550 my ($loop_counts, $perls, $results, $tests);
551 my ($bisect_field, $bisect_min, $bisect_max);
553 if (defined $OPTS{bisect}) {
554 ($bisect_field, $bisect_min, $bisect_max) = split /,/, $OPTS{bisect}, 3;
555 die "Error: --bisect option must be of form 'field,integer,integer'\n"
558 and $bisect_min =~ /^[0-9]+$/
559 and $bisect_max =~ /^[0-9]+$/;
561 die "Error: unrecognised field '$bisect_field' in --bisect option\n"
562 unless $VALID_FIELDS{$bisect_field};
564 die "Error: --bisect min ($bisect_min) must be <= max ($bisect_max)\n"
565 if $bisect_min > $bisect_max;
568 if (defined $OPTS{read}) {
569 open my $in, '<:encoding(UTF-8)', $OPTS{read}
570 or die " Error: can't open $OPTS{read} for reading: $!\n";
571 my $data = do { local $/; <$in> };
574 my $hash = JSON::PP::decode_json($data);
575 if (int($FORMAT_VERSION) < int($hash->{version})) {
576 die "Error: unsupported version $hash->{version} in file"
577 . "'$OPTS{read}' (too new)\n";
579 ($loop_counts, $perls, $results, $tests) =
580 @$hash{qw(loop_counts perls results tests)};
582 filter_tests($results);
583 filter_tests($tests);
586 # How many times to execute the loop for the two trials. The lower
587 # value is intended to do the loop enough times that branch
588 # prediction has taken hold; the higher loop allows us to see the
589 # branch misses after that
590 $loop_counts = [10, 20];
592 $tests = read_tests_file($OPTS{benchfile});
593 die "Error: only a single test may be specified with --bisect\n"
594 if defined $OPTS{bisect} and keys %$tests != 1;
596 $perls = [ process_perls(@$perl_args) ];
599 $results = grind_run($tests, $perls, $loop_counts);
602 # now that we have a list of perls, use it to process the
603 # 'perl' component of the --norm and --sort args
605 $OPTS{norm} = select_a_perl($OPTS{norm}, $perls, "--norm");
606 if (defined $OPTS{'sort-perl'}) {
608 select_a_perl($OPTS{'sort-perl'}, $perls, "--sort");
611 if (defined $OPTS{write}) {
612 my $json = JSON::PP::encode_json({
613 version => $FORMAT_VERSION,
614 loop_counts => $loop_counts,
620 open my $out, '>:encoding(UTF-8)', $OPTS{write}
621 or die " Error: can't open $OPTS{write} for writing: $!\n";
622 print $out $json or die "Error: writing to file '$OPTS{write}': $!\n";
623 close $out or die "Error: closing file '$OPTS{write}': $!\n";
626 my ($processed, $averages) =
627 grind_process($results, $perls, $loop_counts);
629 if (defined $OPTS{bisect}) {
630 my @r = values %$results;
631 die "Panic: expected exactly one test result in bisect\n"
633 @r = values %{$r[0]};
634 die "Panic: expected exactly one perl result in bisect\n"
636 my $c = $r[0]{$bisect_field};
637 die "Panic: no result in bisect for field '$bisect_field'\n"
639 exit 0 if $bisect_min <= $c and $c <= $bisect_max;
643 grind_print($processed, $averages, $perls, $tests);
649 # Run cachegrind for every test/perl combo.
650 # It may run several processes in parallel when -j is specified.
651 # Return a hash ref suitable for input to grind_process()
654 my ($tests, $perls, $counts) = @_;
656 # Build a list of all the jobs to run
660 for my $test (sort keys %$tests) {
662 # Create two test progs: one with an empty loop and one with code.
663 # Note that the empty loop is actually '{1;}' rather than '{}';
664 # this causes the loop to have a single nextstate rather than a
665 # stub op, so more closely matches the active loop; e.g.:
666 # {1;} => nextstate; unstack
667 # {$x=1;} => nextstate; const; gvsv; sassign; unstack
669 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup)}, '1'),
670 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup code)}),
673 for my $p (@$perls) {
674 my ($perl, $label) = @$p;
676 # Run both the empty loop and the active loop
677 # $counts->[0] and $counts->[1] times.
681 my $cmd = "PERL_HASH_SEED=0 "
682 . "valgrind --tool=cachegrind --branch-sim=yes "
683 . "--cachegrind-out-file=/dev/null "
684 . "$OPTS{grindargs} "
685 . "$perl $OPTS{perlargs} - $counts->[$j] 2>&1";
686 # for debugging and error messages
687 my $id = "$test/$perl "
688 . ($i ? "active" : "empty") . "/"
689 . ($j ? "long" : "short") . " loop";
706 # Execute each cachegrind and store the results in %results.
708 local $SIG{PIPE} = 'IGNORE';
710 my $max_jobs = $OPTS{jobs};
711 my $running = 0; # count of executing jobs
712 my %pids; # map pids to jobs
713 my %fds; # map fds to jobs
715 my $select = IO::Select->new();
717 while (@jobs or $running) {
720 printf "Main loop: pending=%d running=%d\n",
721 scalar(@jobs), $running;
726 while (@jobs && $running < $max_jobs) {
727 my $job = shift @jobs;
728 my ($id, $cmd) =@$job{qw(id cmd)};
730 my ($in, $out, $pid);
731 warn "Starting $id\n" if $OPTS{verbose};
732 eval { $pid = IPC::Open2::open2($out, $in, $cmd); 1; }
733 or die "Error: while starting cachegrind subprocess"
738 $job->{out_fd} = $out;
746 print "Started pid $pid for $id\n";
750 # In principle we should write to $in in the main select loop,
751 # since it may block. In reality,
752 # a) the code we write to the perl process's stdin is likely
753 # to be less than the OS's pipe buffer size;
754 # b) by the time the perl process has read in all its stdin,
755 # the only output it should have generated is a few lines
756 # of cachegrind output preamble.
757 # If these assumptions change, then perform the following print
758 # in the select loop instead.
760 print $in $job->{prog};
764 # Get output of running jobs
767 printf "Select: waiting on (%s)\n",
768 join ', ', sort { $a <=> $b } map $fds{$_}{pid},
772 my @ready = $select->can_read;
775 printf "Select: pids (%s) ready\n",
776 join ', ', sort { $a <=> $b } map $fds{$_}{pid}, @ready;
780 die "Panic: select returned no file handles\n";
783 for my $fd (@ready) {
785 my $r = sysread $fd, $j->{output}, 8192, length($j->{output});
786 unless (defined $r) {
787 die "Panic: Read from process running $j->{id} gave:\n$!";
794 print "Got eof for pid $fds{$fd}{pid} ($j->{id})\n";
797 $select->remove($j->{out_fd});
799 or die "Panic: closing output fh on $j->{id} gave:\n$!\n";
801 delete $fds{"$j->{out_fd}"};
802 my $output = $j->{output};
810 print "\n$j->{id}/\nCommand: $j->{cmd}\n"
815 $results{$j->{test}}{$j->{perl}}[$j->{active}][$j->{loopix}]
816 = parse_cachegrind($output, $j->{id}, $j->{perl});
822 my $kid = waitpid(-1, WNOHANG);
826 unless (exists $pids{$kid}) {
827 die "Panic: reaped unexpected child $kid";
831 die sprintf("Error: $j->{id} gave return status 0x%04x\n", $ret)
832 . "with the following output\n:$j->{output}\n";
844 # grind_process(): process the data that has been extracted from
845 # cachgegrind's output.
847 # $res is of the form ->{benchmark_name}{perl_name}[active][count]{field_name},
848 # where active is 0 or 1 indicating an empty or active loop,
849 # count is 0 or 1 indicating a short or long loop. E.g.
851 # $res->{'expr::assign::scalar_lex'}{perl-5.21.1}[0][10]{Dw_mm}
853 # The $res data structure is modified in-place by this sub.
855 # $perls is [ [ perl-exe, perl-label], .... ].
857 # $counts is [ N, M ] indicating the counts for the short and long loops.
860 # return \%output, \%averages, where
862 # $output{benchmark_name}{perl_name}{field_name} = N
863 # $averages{perl_name}{field_name} = M
865 # where N is the raw count ($OPTS{raw}), or count_perl0/count_perlI otherwise;
866 # M is the average raw count over all tests ($OPTS{raw}), or
867 # 1/(sum(count_perlI/count_perl0)/num_tests) otherwise.
870 my ($res, $perls, $counts) = @_;
872 # Process the four results for each test/perf combo:
874 # $res->{benchmark_name}{perl_name}[active][count]{field_name} = n
876 # $res->{benchmark_name}{perl_name}{field_name} = averaged_n
878 # $r[0][1] - $r[0][0] is the time to do ($counts->[1]-$counts->[0])
879 # empty loops, eliminating startup time
880 # $r[1][1] - $r[1][0] is the time to do ($counts->[1]-$counts->[0])
881 # active loops, eliminating startup time
882 # (the two startup times may be different because different code
883 # is being compiled); the difference of the two results above
884 # divided by the count difference is the time to execute the
885 # active code once, eliminating both startup and loop overhead.
887 for my $tests (values %$res) {
888 for my $r (values %$tests) {
890 for (keys %{$r->[0][0]}) {
891 my $n = ( ($r->[1][1]{$_} - $r->[1][0]{$_})
892 - ($r->[0][1]{$_} - $r->[0][0]{$_})
893 ) / ($counts->[1] - $counts->[0]);
904 my $perl_norm = $perls->[$OPTS{norm}][0]; # the name of the reference perl
906 for my $test_name (keys %$res) {
907 my $res1 = $res->{$test_name};
908 my $res2_norm = $res1->{$perl_norm};
909 for my $perl (keys %$res1) {
910 my $res2 = $res1->{$perl};
911 for my $field (keys %$res2) {
912 my ($p, $q) = ($res2_norm->{$field}, $res2->{$field});
915 # Avoid annoying '-0.0' displays. Ideally this number
916 # should never be negative, but fluctuations in
917 # startup etc can theoretically make this happen
918 $q = 0 if ($q <= 0 && $q > -0.1);
919 $totals{$perl}{$field} += $q;
920 $counts{$perl}{$field}++;
921 $data{$test_name}{$perl}{$field} = $q;
925 # $p and $q are notionally integer counts, but
926 # due to variations in startup etc, it's possible for a
927 # count which is supposedly zero to be calculated as a
928 # small positive or negative value.
929 # In this case, set it to zero. Further below we
930 # special-case zeros to avoid division by zero errors etc.
932 $p = 0.0 if $p < 0.01;
933 $q = 0.0 if $q < 0.01;
935 if ($p == 0.0 && $q == 0.0) {
936 # Both perls gave a count of zero, so no change:
938 $totals{$perl}{$field} += 1;
939 $counts{$perl}{$field}++;
940 $data{$test_name}{$perl}{$field} = 1;
942 elsif ($p == 0.0 || $q == 0.0) {
943 # If either count is zero, there were too few events
944 # to give a meaningful ratio (and we will end up with
945 # division by zero if we try). Mark the result undef,
946 # indicating that it shouldn't be displayed; and skip
947 # adding to the average
948 $data{$test_name}{$perl}{$field} = undef;
951 # For averages, we record q/p rather than p/q.
952 # Consider a test where perl_norm took 1000 cycles
953 # and perlN took 800 cycles. For the individual
954 # results we display p/q, or 1.25; i.e. a quarter
955 # quicker. For the averages, we instead sum all
956 # the 0.8's, which gives the total cycles required to
957 # execute all tests, with all tests given equal
958 # weight. Later we reciprocate the final result,
959 # i.e. 1/(sum(qi/pi)/n)
961 $totals{$perl}{$field} += $q/$p;
962 $counts{$perl}{$field}++;
963 $data{$test_name}{$perl}{$field} = $p/$q;
969 # Calculate averages based on %totals and %counts accumulated earlier.
972 for my $perl (keys %totals) {
973 my $t = $totals{$perl};
974 for my $field (keys %$t) {
975 $averages{$perl}{$field} = $OPTS{raw}
976 ? $t->{$field} / $counts{$perl}{$field}
977 # reciprocal - see comments above
978 : $counts{$perl}{$field} / $t->{$field};
982 return \%data, \%averages;
986 # grind_print(): display the tabulated results of all the cachegrinds.
988 # Arguments are of the form:
989 # $results->{benchmark_name}{perl_name}{field_name} = N
990 # $averages->{perl_name}{field_name} = M
991 # $perls = [ [ perl-exe, perl-label ], ... ]
992 # $tests->{test_name}{desc => ..., ...}
995 my ($results, $averages, $perls, $tests) = @_;
997 my @perl_names = map $_->[0], @$perls;
999 $perl_labels{$_->[0]} = $_->[1] for @$perls;
1001 my $field_label_width = 6;
1002 # Calculate the width to display for each column.
1003 my $min_width = $OPTS{raw} ? 8 : 6;
1004 my @widths = map { length($_) < $min_width ? $min_width : length($_) }
1005 @perl_labels{@perl_names};
1014 COND conditional branches
1015 IND indirect branches
1016 _m branch predict miss
1017 _m1 level 1 cache miss
1018 _mm last cache (e.g. L3) miss
1019 - indeterminate percentage (e.g. 1/0)
1024 print "The numbers represent raw counts per loop iteration.\n";
1028 The numbers represent relative counts per loop iteration, compared to
1029 $perl_labels{$perl_names[0]} at 100.0%.
1030 Higher is better: for example, using half as many instructions gives 200%,
1031 while using twice as many gives 50%.
1035 # Populate @test_names with the tests in sorted order.
1038 unless ($OPTS{average}) {
1039 if (defined $OPTS{'sort-field'}) {
1040 my ($field, $perlix) = @OPTS{'sort-field', 'sort-perl'};
1041 my $perl = $perls->[$perlix][0];
1044 $results->{$a}{$perl}{$field}
1045 <=> $results->{$b}{$perl}{$field}
1050 @test_names = sort(keys %$results);
1054 # No point in displaying average for only one test.
1055 push @test_names, 'AVERAGE' unless @test_names == 1;
1057 # If only a single field is to be displayed, use a more compact
1058 # format with only a single line of output per test.
1060 my $one_field = defined $OPTS{fields} && keys(%{$OPTS{fields}}) == 1;
1063 print "Results for field " . (keys(%{$OPTS{fields}}))[0] . ".\n";
1065 # The first column will now contain test names rather than
1066 # field names; Calculate the max width.
1068 $field_label_width = 0;
1070 $field_label_width = length if length > $field_label_width;
1073 # Print the perl executables header.
1077 print " " x $field_label_width;
1079 printf " %*s", $widths[$_],
1080 $i ? ('-' x$widths[$_]) : $perl_labels{$perl_names[$_]};
1086 # Dump the results for each test.
1088 for my $test_name (@test_names) {
1089 my $doing_ave = ($test_name eq 'AVERAGE');
1090 my $res1 = $doing_ave ? $averages : $results->{$test_name};
1092 unless ($one_field) {
1093 print "\n$test_name";
1094 print "\n$tests->{$test_name}{desc}" unless $doing_ave;
1097 # Print the perl executables header.
1099 print " " x $field_label_width;
1101 printf " %*s", $widths[$_],
1102 $i ? ('-' x$widths[$_]) : $perl_labels{$perl_names[$_]};
1108 for my $field (qw(Ir Dr Dw COND IND
1117 next if $OPTS{fields} and ! exists $OPTS{fields}{$field};
1119 if ($field eq 'N') {
1125 printf "%-*s", $field_label_width, $test_name;
1128 printf "%*s", $field_label_width, $field;
1131 for my $i (0..$#widths) {
1132 my $res2 = $res1->{$perl_names[$i]};
1133 my $p = $res2->{$field};
1135 printf " %*s", $widths[$i], '-';
1137 elsif ($OPTS{raw}) {
1138 printf " %*.1f", $widths[$i], $p;
1141 printf " %*.2f", $widths[$i], $p * 100;
1150 # do_selftest(): check that we can parse known cachegrind()
1151 # output formats. If the output of cachegrind changes, add a *new*
1152 # test here; keep the old tests to make sure we continue to parse
1160 ==32350== Cachegrind, a cache and branch-prediction profiler
1161 ==32350== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
1162 ==32350== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
1163 ==32350== Command: perl5211o /tmp/uiS2gjdqe5 1
1165 --32350-- warning: L3 cache found, using its data for the LL simulation.
1167 ==32350== I refs: 1,124,055
1168 ==32350== I1 misses: 5,573
1169 ==32350== LLi misses: 3,338
1170 ==32350== I1 miss rate: 0.49%
1171 ==32350== LLi miss rate: 0.29%
1173 ==32350== D refs: 404,275 (259,191 rd + 145,084 wr)
1174 ==32350== D1 misses: 9,608 ( 6,098 rd + 3,510 wr)
1175 ==32350== LLd misses: 5,794 ( 2,781 rd + 3,013 wr)
1176 ==32350== D1 miss rate: 2.3% ( 2.3% + 2.4% )
1177 ==32350== LLd miss rate: 1.4% ( 1.0% + 2.0% )
1179 ==32350== LL refs: 15,181 ( 11,671 rd + 3,510 wr)
1180 ==32350== LL misses: 9,132 ( 6,119 rd + 3,013 wr)
1181 ==32350== LL miss rate: 0.5% ( 0.4% + 2.0% )
1183 ==32350== Branches: 202,372 (197,050 cond + 5,322 ind)
1184 ==32350== Mispredicts: 19,153 ( 17,742 cond + 1,411 ind)
1185 ==32350== Mispred rate: 9.4% ( 9.0% + 26.5% )
1205 last if require "$_/test.pl";
1207 plan(@tests / 3 * keys %VALID_FIELDS);
1210 my $desc = shift @tests;
1211 my $output = shift @tests;
1212 my $expected = shift @tests;
1213 my $p = parse_cachegrind($output);
1214 for (sort keys %VALID_FIELDS) {
1215 is($p->{$_}, $expected->{$_}, "$desc, $_");