3 # A tool for analysing the performance of the code snippets found in
4 # t/perf/benchmarks or similar
9 bench.pl - Compare the performance of perl code snippets across multiple
14 # Basic: run the tests in t/perf/benchmarks against two or
17 bench.pl [options] -- perlA[=labelA] perlB[=labelB] ...
19 # run the tests against same perlA 2x, with and without extra
22 bench.pl [options] -- perlA=fast PerlA=slow -Mstrict -Dpsltoc
24 # Run bench.pl's own built-in sanity tests
26 bench.pl --action=selftest
28 # Run bench on blead, which is then modified and timed again
30 bench.pl [options] --write=blead.time -- ./perl=blead
32 bench.pl --read=blead.time -- ./perl=hacked
34 # You can also combine --read with --write
35 bench.pl --read=blead.time --write=last.time -- ./perl=hacked
39 By default, F<bench.pl> will run code snippets found in
40 F<t/perf/benchmarks> (or similar) under cachegrind, in order to calculate
41 how many instruction reads, data writes, branches, cache misses, etc. that
42 one execution of the snippet uses. It will run them against two or more
43 perl executables and show how much each test has gotten better or worse.
45 It is modelled on the F<perlbench> tool, but since it measures instruction
46 reads etc., rather than timings, it is much more precise and reproducible.
47 It is also considerably faster, and is capable of running tests in
48 parallel (with C<-j>). Rather than displaying a single relative
49 percentage per test/perl combination, it displays values for 13 different
50 measurements, such as instruction reads, conditional branch misses etc.
52 There are options to write the raw data to a file, and to read it back.
53 This means that you can view the same run data in different views with
54 different selection and sort options. You can also use this mechanism
55 to save the results of timing one perl, and then read it back while timing
56 a modification, so that you dont have rerun the same tests on the same
57 perl over and over, or have two perls built at the same time.
59 The optional C<=label> after each perl executable is used in the display
60 output. If you are doing a two step benchmark then you should provide
61 a label for at least the "base" perl.
71 What action to perform. The default is I<grind>, which runs the benchmarks
72 using I<cachegrind> as the back end. The only other action at the moment is
73 I<selftest>, which runs some basic sanity checks and produces TAP output.
79 Only display the overall average, rather than the results for each
86 The path of the file which contains the benchmarks (F<t/perf/benchmarks>
91 --bisect=I<field,minval,maxval>
93 Run a single test against one perl and exit with a zero status if the
94 named field is in the specified range; exit 1 otherwise. It will complain
95 if more than one test or perl has been specified. It is intended to be
96 called as part of a bisect run, to determine when something changed.
99 bench.pl -j 8 --tests=foo --bisect=Ir,100,105 --perlargs=-Ilib \
102 might be called from bisect to find when the number of instruction reads
103 for test I<foo> falls outside the range 100..105.
109 Display the results for a single perl executable in a compact form.
110 Which perl to display is specified in the same manner as C<--norm>.
116 Enable verbose debugging output.
122 Display only the specified fields; for example,
124 --fields=Ir,Ir_m,Ir_mm
126 If only one field is selected, the output is in more compact form.
132 Optional command-line arguments to pass to all cachegrind invocations.
134 This option is appended to those which bench.pl uses for its own
135 purposes; so it can be used to override them (see --debug output
136 below), and can also be 'abused' to add redirects into the valgrind
139 For example, this writes PERL_MEM_LOG activity to foobar.$$, because
140 3>foobar.$$ redirects fd 3, then perl under PERL_MEM_LOG writes to fd 3.
142 $ perl Porting/bench.pl --jobs=2 --verbose --debug \
143 --tests=call::sub::amp_empty \
145 --grindargs='--cachegrind-out-file=junk.$$ 3>foobar.$$' \
147 perl5.24.0 perl5.24.0:+memlog:PERL_MEM_LOG=3mst
149 for the +memlog tests, this executes as: (shown via --debug, then prettyfied)
151 Command: PERL_HASH_SEED=0 PERL_MEM_LOG=3mst
152 valgrind --tool=cachegrind --branch-sim=yes
153 --cachegrind-out-file=/dev/null --cachegrind-out-file=junk.$$
154 3>foobar.$$ perl5.24.0 - 10 2>&1
156 The result is that a set of junk.$$ files containing raw cachegrind
157 output are written, and foobar.$$ contains the expected memlog output.
161 Theres no obvious utility for those junk.$$ and foobar.$$ files, but
162 you can have them anyway.
164 The 3 in PERL_MEM_LOG=3mst is needed because the output would
165 otherwize go to STDERR, and cause parse_cachegrind() to reject the
168 The --grindargs redirect is needed to capture the memlog output;
169 without it, the memlog output is written to fd3, around
170 parse_cachegrind and effectively into /dev/null
172 PERL_MEM_LOG is expensive when used.
175 &foo function call with no args or body
177 perl5.24.0 perl5.24.0+memlog
178 ---------- -----------------
201 Display basic usage information.
208 Run I<N> jobs in parallel (default 1). This determines how many cachegrind
209 process will running at a time, and should generally be set to the number
216 Specify which perl column in the output to treat as the 100% norm.
217 It may be a column number (0..N-1) or a perl executable name or label.
218 It defaults to the leftmost column.
224 Optional command-line arguments to pass to each perl-under-test
225 (perlA, perlB in synopsis) For example, C<--perlargs=-Ilib>.
231 Display raw data counts rather than percentages in the outputs. This
232 allows you to see the exact number of intruction reads, branch misses etc.
233 for each test/perl combination. It also causes the C<AVERAGE> display
234 per field to be calculated based on the average of each tests's count
235 rather than average of each percentage. This means that tests with very
236 high counts will dominate.
242 Order the tests in the output based on the value of I<field> in the
243 column I<perl>. The I<perl> value is as per C<--norm>. For example
245 bench.pl --sort=Dw:perl-5.20.0 \
246 perl-5.16.0 perl-5.18.0 perl-5.20.0
253 Read in saved data from a previous C<--write> run from the specified file.
255 Requires C<JSON::PP> to be available.
261 Specify a subset of tests to run (or in the case of C<--read>, to display).
262 It may be either a comma-separated list of test names, or a regular
263 expression. For example
265 --tests=expr::assign::scalar_lex,expr::assign::2list_lex
272 Display progress information.
279 Save the raw data to the specified file. It can be read back later with
280 C<--read>. If combined with C<--read> then the output file will be
281 the merge of the file read and any additional perls added on the command
284 Requires C<JSON::PP> to be available.
295 use Getopt::Long qw(:config no_auto_abbrev require_order);
299 use POSIX ":sys_wait_h";
301 # The version of the file format used to save data. We refuse to process
302 # the file if the integer component differs.
304 my $FORMAT_VERSION = 1.0;
306 # The fields we know about
308 my %VALID_FIELDS = map { $_ => 1 }
309 qw(Ir Ir_m1 Ir_mm Dr Dr_m1 Dr_mm Dw Dw_m1 Dw_mm COND COND_m IND IND_m);
313 usage: $0 [options] -- perl[=label] ...
314 --action=foo What action to perform [default: grind].
315 --average Only display average, not individual test results.
316 --benchfile=foo File containing the benchmarks;
317 [default: t/perf/benchmarks].
318 --bisect=f,min,max run a single test against one perl and exit with a
319 zero status if the named field is in the specified
320 range; exit 1 otherwise.
321 --compact=perl Display the results of a single perl in compact form.
322 Which perl specified like --norm
323 --debug Enable verbose debugging output.
324 --fields=a,b,c Display only the specified fields (e.g. Ir,Ir_m,Ir_mm).
325 --grindargs=foo Optional command-line args to pass to cachegrind.
326 --help Display this help.
327 -j|--jobs=N Run N jobs in parallel [default 1].
328 --norm=perl Which perl column to treat as 100%; may be a column
329 number (0..N-1) or a perl executable name or label;
331 --perlargs=foo Optional command-line args to pass to each perl to run.
332 --raw Display raw data counts rather than percentages.
333 --show Show results even though we are going to write results.
334 --sort=field:perl Sort the tests based on the value of 'field' in the
335 column 'perl'. The perl value is as per --norm.
336 -r|--read=file Read in previously saved data from the specified file.
337 --tests=FOO Select only the specified tests from the benchmarks file;
338 FOO may be either of the form 'foo,bar' or '/regex/';
339 [default: all tests].
340 --verbose Display progress information.
341 -w|--write=file Save the raw data to the specified file.
344 grind run the code under cachegrind
345 selftest perform a selftest; produce TAP output
347 The command line ends with one or more specified perl executables,
348 which will be searched for in the current \$PATH. Each binary name may
349 have an optional =LABEL appended, which will be used rather than the
350 executable name in output. E.g.
352 perl-5.20.1=PRE-BUGFIX perl-5.20.1-new=POST-BUGFIX
359 benchfile => 't/perf/benchmarks',
378 # process command-line args and call top-level action
382 'action=s' => \$OPTS{action},
383 'average' => \$OPTS{average},
384 'benchfile=s' => \$OPTS{benchfile},
385 'bisect=s' => \$OPTS{bisect},
386 'compact=s' => \$OPTS{compact},
387 'debug' => \$OPTS{debug},
388 'grindargs=s' => \$OPTS{grindargs},
389 'help' => \$OPTS{help},
390 'fields=s' => \$OPTS{fields},
391 'jobs|j=i' => \$OPTS{jobs},
392 'norm=s' => \$OPTS{norm},
393 'perlargs=s' => \$OPTS{perlargs},
394 'raw' => \$OPTS{raw},
395 'read|r=s' => \$OPTS{read},
396 'show!' => \$OPTS{show},
397 'sort=s' => \$OPTS{sort},
398 'tests=s' => \$OPTS{tests},
399 'verbose' => \$OPTS{verbose},
400 'write|w=s' => \$OPTS{write},
403 usage if $OPTS{help};
406 if (defined $OPTS{read} or defined $OPTS{write}) {
407 # fail early if it's not present
411 if (defined $OPTS{fields}) {
412 my @f = split /,/, $OPTS{fields};
414 die "Error: --fields: unknown field '$_'\n"
415 unless $VALID_FIELDS{$_};
417 my %f = map { $_ => 1 } @f;
421 my %valid_actions = qw(grind 1 selftest 1);
422 unless ($valid_actions{$OPTS{action}}) {
423 die "Error: unrecognised action '$OPTS{action}'\n"
424 . "must be one of: " . join(', ', sort keys %valid_actions)."\n";
427 if (defined $OPTS{sort}) {
428 my @s = split /:/, $OPTS{sort};
430 die "Error: --sort argument should be of the form field:perl: "
433 my ($field, $perl) = @s;
434 die "Error: --sort: unknown field '$field\n"
435 unless $VALID_FIELDS{$field};
436 # the 'perl' value will be validated later, after we have processed
438 $OPTS{'sort-field'} = $field;
439 $OPTS{'sort-perl'} = $perl;
442 if ($OPTS{action} eq 'selftest') {
444 die "Error: no perl executables may be specified with --read\n"
447 elsif (defined $OPTS{bisect}) {
448 die "Error: exactly one perl executable must be specified for bisect\n"
450 die "Error: Can't specify both --bisect and --read\n"
451 if defined $OPTS{read};
452 die "Error: Can't specify both --bisect and --write\n"
453 if defined $OPTS{write};
456 if ($OPTS{action} eq 'grind') {
459 elsif ($OPTS{action} eq 'selftest') {
466 # Given a hash ref keyed by test names, filter it by deleting unwanted
467 # tests, based on $OPTS{tests}.
472 my $opt = $OPTS{tests};
473 return unless defined $opt;
478 $opt =~ s{^/(.+)/$}{$1}
479 or die "Error: --tests regex must be of the form /.../\n";
481 delete $tests->{$_} unless /$opt/;
486 for (split /,/, $opt) {
487 die "Error: no such test found: '$_'\n"
488 . ($OPTS{verbose} ? " have: @{[ sort keys %$tests ]}\n" : "")
489 unless exists $tests->{$_};
493 delete $tests->{$_} unless exists $t{$_};
496 die "Error: no tests to run\n" unless %$tests;
500 # Read in the test file, and filter out any tests excluded by $OPTS{tests}
501 # return a hash ref { testname => { test }, ... }
502 # and an array ref of the original test names order,
504 sub read_tests_file {
513 die "Error: can't parse '$file': $@\n" if $@;
514 die "Error: can't read '$file': $!\n";
518 for (my $i=0; $i < @$ta; $i += 2) {
519 push @orig_order, $ta->[$i];
524 return $t, \@orig_order;
528 # Process the perl/column argument of options like --norm and --sort.
529 # Return the index of the matching perl.
532 my ($perl, $perls, $who) = @_;
534 if ($perl =~ /^[0-9]$/) {
535 die "Error: $who value $perl outside range 0.." . $#$perls . "\n"
536 unless $perl < @$perls;
540 my @perl = grep $perls->[$_][0] eq $perl
541 || $perls->[$_][1] eq $perl,
543 die "Error: $who: unrecognised perl '$perl'\n"
545 die "Error: $who: ambiguous perl '$perl'\n"
552 # Validate the list of perl=label (+ cmdline options) on the command line.
553 # Return a list of [ exe, label, cmdline-options ] tuples, i.e.
554 # 'perl-under-test's (PUTs)
557 my $read_perls= shift;
558 my @res_puts; # returned, each item is [ perlexe, label, @putargs ]
559 my %seen= map { $_->[1] => 1 } @$read_perls;
560 my @putargs; # collect not-perls into args per PUT
562 for my $p (reverse @_) {
563 push @putargs, $p and next if $p =~ /^-/; # not-perl, dont send to qx//
565 my ($perl, $label, $env) = split /[=:,]/, $p, 3;
567 $label = $perl.$label if $label =~ /^\+/;
568 die "$label cannot be used on 2 different perls under test\n" if $seen{$label}++;
572 %env = split /[=,]/, $env;
574 my $r = qx($perl -e 'print qq(ok\n)' 2>&1);
576 push @res_puts, [ $perl, $label, \%env, reverse @putargs ];
578 warn "Added Perl-Under-Test: [ @{[@{$res_puts[-1]}]} ]\n"
581 warn "perl-under-test args: @putargs + a not-perl: $p $r\n"
583 push @putargs, $p; # not-perl
586 return reverse @res_puts;
591 # Return a string containing perl test code wrapped in a loop
592 # that runs $ARGV[0] times
595 my ($test, $desc, $setup, $code) = @_;
602 for my \$__loop__ (1..\$ARGV[0]) {
609 # Parse the output from cachegrind. Return a hash ref.
610 # See do_selftest() for examples of the output format.
612 sub parse_cachegrind {
613 my ($output, $id, $perl) = @_;
617 my @lines = split /\n/, $output;
619 unless (s/(==\d+==)|(--\d+--) //) {
620 die "Error: while executing $id:\n"
621 . "unexpected code or cachegrind output:\n$_\n";
623 if (/I refs:\s+([\d,]+)/) {
626 elsif (/I1 misses:\s+([\d,]+)/) {
629 elsif (/LLi misses:\s+([\d,]+)/) {
632 elsif (/D refs:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
633 @res{qw(Dr Dw)} = ($1,$2);
635 elsif (/D1 misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
636 @res{qw(Dr_m1 Dw_m1)} = ($1,$2);
638 elsif (/LLd misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
639 @res{qw(Dr_mm Dw_mm)} = ($1,$2);
641 elsif (/Branches:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
642 @res{qw(COND IND)} = ($1,$2);
644 elsif (/Mispredicts:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
645 @res{qw(COND_m IND_m)} = ($1,$2);
649 for my $field (keys %VALID_FIELDS) {
650 die "Error: can't parse '$field' field from cachegrind output:\n$output"
651 unless exists $res{$field};
652 $res{$field} =~ s/,//g;
659 # Handle the 'grind' action
662 my ($perl_args) = @_; # the residue of @ARGV after option processing
664 my ($loop_counts, $perls, $results, $tests, $order);
665 my ($bisect_field, $bisect_min, $bisect_max);
667 if (defined $OPTS{bisect}) {
668 ($bisect_field, $bisect_min, $bisect_max) = split /,/, $OPTS{bisect}, 3;
669 die "Error: --bisect option must be of form 'field,integer,integer'\n"
672 and $bisect_min =~ /^[0-9]+$/
673 and $bisect_max =~ /^[0-9]+$/;
675 die "Error: unrecognised field '$bisect_field' in --bisect option\n"
676 unless $VALID_FIELDS{$bisect_field};
678 die "Error: --bisect min ($bisect_min) must be <= max ($bisect_max)\n"
679 if $bisect_min > $bisect_max;
683 open my $in, '<:encoding(UTF-8)', $OPTS{read}
684 or die " Error: can't open '$OPTS{read}' for reading: $!\n";
685 my $data = do { local $/; <$in> };
688 my $hash = JSON::PP::decode_json($data);
689 if (int($FORMAT_VERSION) < int($hash->{version})) {
690 die "Error: unsupported version $hash->{version} in file"
691 . "'$OPTS{read}' (too new)\n";
693 ($loop_counts, $perls, $results, $tests, $order) =
694 @$hash{qw(loop_counts perls results tests order)};
696 filter_tests($results);
697 filter_tests($tests);
700 $order = [ sort keys %$tests ];
705 unless ($loop_counts) {
706 # How many times to execute the loop for the two trials. The lower
707 # value is intended to do the loop enough times that branch
708 # prediction has taken hold; the higher loop allows us to see the
709 # branch misses after that
710 $loop_counts = [10, 20];
712 ($tests, $order) = read_tests_file($OPTS{benchfile});
713 die "Error: only a single test may be specified with --bisect\n"
714 if defined $OPTS{bisect} and keys %$tests != 1;
717 my @run_perls= process_puts($perls, @$perl_args);
718 push @$perls, @run_perls;
719 die "Error: Not enough perls to run a report, and --write not specified.\n"
720 if @$perls < 2 and !$OPTS{write};
721 $results = grind_run($tests, $order, \@run_perls, $loop_counts, $results);
724 if (!$perls or !@$perls) {
725 die "Error: nothing to do: no perls to run, no data to read.\n";
727 # now that we have a list of perls, use it to process the
728 # 'perl' component of the --norm and --sort args
730 $OPTS{norm} = select_a_perl($OPTS{norm}, $perls, "--norm");
731 if (defined $OPTS{'sort-perl'}) {
733 select_a_perl($OPTS{'sort-perl'}, $perls, "--sort");
736 if (defined $OPTS{'compact'}) {
738 select_a_perl($OPTS{'compact'}, $perls, "--compact");
740 if (defined $OPTS{write}) {
741 my $json = JSON::PP::encode_json({
742 version => $FORMAT_VERSION,
743 loop_counts => $loop_counts,
750 open my $out, '>:encoding(UTF-8)', $OPTS{write}
751 or die "Error: can't open '$OPTS{write}' for writing: $!\n";
752 print $out $json or die "Error: writing to file '$OPTS{write}': $!\n";
753 close $out or die "Error: closing file '$OPTS{write}': $!\n";
755 if (!$OPTS{write} or $OPTS{show}) {
757 die "Error: need more than one perl to do a report.\n";
759 my ($processed, $averages) =
760 grind_process($results, $perls, $loop_counts);
762 if (defined $OPTS{bisect}) {
763 my @r = values %$results;
764 die "Panic: expected exactly one test result in bisect\n"
766 @r = values %{$r[0]};
767 die "Panic: expected exactly one perl result in bisect\n"
769 my $c = $r[0]{$bisect_field};
770 die "Panic: no result in bisect for field '$bisect_field'\n"
772 exit 0 if $bisect_min <= $c and $c <= $bisect_max;
775 elsif (defined $OPTS{compact}) {
776 grind_print_compact($processed, $averages, $OPTS{compact},
777 $perls, $tests, $order);
780 grind_print($processed, $averages, $perls, $tests, $order);
786 # Run cachegrind for every test/perl combo.
787 # It may run several processes in parallel when -j is specified.
788 # Return a hash ref suitable for input to grind_process()
791 my ($tests, $order, $perls, $counts, $results) = @_;
793 # Build a list of all the jobs to run
797 for my $test (grep $tests->{$_}, @$order) {
799 # Create two test progs: one with an empty loop and one with code.
800 # Note that the empty loop is actually '{1;}' rather than '{}';
801 # this causes the loop to have a single nextstate rather than a
802 # stub op, so more closely matches the active loop; e.g.:
803 # {1;} => nextstate; unstack
804 # {$x=1;} => nextstate; const; gvsv; sassign; unstack
806 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup)}, '1'),
807 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup code)}),
810 for my $p (@$perls) {
811 my ($perl, $label, $env, @putargs) = @$p;
813 # Run both the empty loop and the active loop
814 # $counts->[0] and $counts->[1] times.
820 $envstr .= "$_=$env->{$_} " for sort keys %$env;
822 my $cmd = "PERL_HASH_SEED=0 $envstr"
823 . "valgrind --tool=cachegrind --branch-sim=yes "
824 . "--cachegrind-out-file=/dev/null "
825 . "$OPTS{grindargs} "
826 . "$perl $OPTS{perlargs} @putargs - $counts->[$j] 2>&1";
827 # for debugging and error messages
828 my $id = "$test/$label "
829 . ($i ? "active" : "empty") . "/"
830 . ($j ? "long" : "short") . " loop";
847 # Execute each cachegrind and store the results in %results.
849 local $SIG{PIPE} = 'IGNORE';
851 my $max_jobs = $OPTS{jobs};
852 my $running = 0; # count of executing jobs
853 my %pids; # map pids to jobs
854 my %fds; # map fds to jobs
855 my $select = IO::Select->new();
857 while (@jobs or $running) {
860 printf "Main loop: pending=%d running=%d\n",
861 scalar(@jobs), $running;
866 while (@jobs && $running < $max_jobs) {
867 my $job = shift @jobs;
868 my ($id, $cmd) =@$job{qw(id cmd)};
870 my ($in, $out, $pid);
871 warn "Starting $id\n" if $OPTS{verbose};
872 eval { $pid = IPC::Open2::open2($out, $in, $cmd); 1; }
873 or die "Error: while starting cachegrind subprocess"
878 $job->{out_fd} = $out;
886 print "Started pid $pid for $id\n";
890 # In principle we should write to $in in the main select loop,
891 # since it may block. In reality,
892 # a) the code we write to the perl process's stdin is likely
893 # to be less than the OS's pipe buffer size;
894 # b) by the time the perl process has read in all its stdin,
895 # the only output it should have generated is a few lines
896 # of cachegrind output preamble.
897 # If these assumptions change, then perform the following print
898 # in the select loop instead.
900 print $in $job->{prog};
904 # Get output of running jobs
907 printf "Select: waiting on (%s)\n",
908 join ', ', sort { $a <=> $b } map $fds{$_}{pid},
912 my @ready = $select->can_read;
915 printf "Select: pids (%s) ready\n",
916 join ', ', sort { $a <=> $b } map $fds{$_}{pid}, @ready;
920 die "Panic: select returned no file handles\n";
923 for my $fd (@ready) {
925 my $r = sysread $fd, $j->{output}, 8192, length($j->{output});
926 unless (defined $r) {
927 die "Panic: Read from process running $j->{id} gave:\n$!";
934 print "Got eof for pid $fds{$fd}{pid} ($j->{id})\n";
937 $select->remove($j->{out_fd});
939 or die "Panic: closing output fh on $j->{id} gave:\n$!\n";
941 delete $fds{"$j->{out_fd}"};
942 my $output = $j->{output};
950 print "\n$j->{id}/\nCommand: $j->{cmd}\n"
955 $results->{$j->{test}}{$j->{plabel}}[$j->{active}][$j->{loopix}]
956 = parse_cachegrind($output, $j->{id}, $j->{perl});
962 my $kid = waitpid(-1, WNOHANG);
966 unless (exists $pids{$kid}) {
967 die "Panic: reaped unexpected child $kid";
971 die sprintf("Error: $j->{id} gave return status 0x%04x\n", $ret)
972 . "with the following output\n:$j->{output}\n";
984 # grind_process(): process the data that has been extracted from
985 # cachgegrind's output.
987 # $res is of the form ->{benchmark_name}{perl_name}[active][count]{field_name},
988 # where active is 0 or 1 indicating an empty or active loop,
989 # count is 0 or 1 indicating a short or long loop. E.g.
991 # $res->{'expr::assign::scalar_lex'}{perl-5.21.1}[0][10]{Dw_mm}
993 # The $res data structure is modified in-place by this sub.
995 # $perls is [ [ perl-exe, perl-label], .... ].
997 # $counts is [ N, M ] indicating the counts for the short and long loops.
1000 # return \%output, \%averages, where
1002 # $output{benchmark_name}{perl_name}{field_name} = N
1003 # $averages{perl_name}{field_name} = M
1005 # where N is the raw count ($OPTS{raw}), or count_perl0/count_perlI otherwise;
1006 # M is the average raw count over all tests ($OPTS{raw}), or
1007 # 1/(sum(count_perlI/count_perl0)/num_tests) otherwise.
1010 my ($res, $perls, $counts) = @_;
1012 # Process the four results for each test/perf combo:
1014 # $res->{benchmark_name}{perl_name}[active][count]{field_name} = n
1016 # $res->{benchmark_name}{perl_name}{field_name} = averaged_n
1018 # $r[0][1] - $r[0][0] is the time to do ($counts->[1]-$counts->[0])
1019 # empty loops, eliminating startup time
1020 # $r[1][1] - $r[1][0] is the time to do ($counts->[1]-$counts->[0])
1021 # active loops, eliminating startup time
1022 # (the two startup times may be different because different code
1023 # is being compiled); the difference of the two results above
1024 # divided by the count difference is the time to execute the
1025 # active code once, eliminating both startup and loop overhead.
1027 for my $tests (values %$res) {
1028 for my $r (values %$tests) {
1030 for (keys %{$r->[0][0]}) {
1031 my $n = ( ($r->[1][1]{$_} - $r->[1][0]{$_})
1032 - ($r->[0][1]{$_} - $r->[0][0]{$_})
1033 ) / ($counts->[1] - $counts->[0]);
1044 my $perl_norm = $perls->[$OPTS{norm}][1]; # the label of the reference perl
1046 for my $test_name (keys %$res) {
1047 my $res1 = $res->{$test_name};
1048 my $res2_norm = $res1->{$perl_norm};
1049 for my $perl (keys %$res1) {
1050 my $res2 = $res1->{$perl};
1051 for my $field (keys %$res2) {
1052 my ($p, $q) = ($res2_norm->{$field}, $res2->{$field});
1055 # Avoid annoying '-0.0' displays. Ideally this number
1056 # should never be negative, but fluctuations in
1057 # startup etc can theoretically make this happen
1058 $q = 0 if ($q <= 0 && $q > -0.1);
1059 $totals{$perl}{$field} += $q;
1060 $counts{$perl}{$field}++;
1061 $data{$test_name}{$perl}{$field} = $q;
1065 # $p and $q are notionally integer counts, but
1066 # due to variations in startup etc, it's possible for a
1067 # count which is supposedly zero to be calculated as a
1068 # small positive or negative value.
1069 # In this case, set it to zero. Further below we
1070 # special-case zeros to avoid division by zero errors etc.
1072 $p = 0.0 if $p < 0.01;
1073 $q = 0.0 if $q < 0.01;
1075 if ($p == 0.0 && $q == 0.0) {
1076 # Both perls gave a count of zero, so no change:
1078 $totals{$perl}{$field} += 1;
1079 $counts{$perl}{$field}++;
1080 $data{$test_name}{$perl}{$field} = 1;
1082 elsif ($p == 0.0 || $q == 0.0) {
1083 # If either count is zero, there were too few events
1084 # to give a meaningful ratio (and we will end up with
1085 # division by zero if we try). Mark the result undef,
1086 # indicating that it shouldn't be displayed; and skip
1087 # adding to the average
1088 $data{$test_name}{$perl}{$field} = undef;
1091 # For averages, we record q/p rather than p/q.
1092 # Consider a test where perl_norm took 1000 cycles
1093 # and perlN took 800 cycles. For the individual
1094 # results we display p/q, or 1.25; i.e. a quarter
1095 # quicker. For the averages, we instead sum all
1096 # the 0.8's, which gives the total cycles required to
1097 # execute all tests, with all tests given equal
1098 # weight. Later we reciprocate the final result,
1099 # i.e. 1/(sum(qi/pi)/n)
1101 $totals{$perl}{$field} += $q/$p;
1102 $counts{$perl}{$field}++;
1103 $data{$test_name}{$perl}{$field} = $p/$q;
1109 # Calculate averages based on %totals and %counts accumulated earlier.
1112 for my $perl (keys %totals) {
1113 my $t = $totals{$perl};
1114 for my $field (keys %$t) {
1115 $averages{$perl}{$field} = $OPTS{raw}
1116 ? $t->{$field} / $counts{$perl}{$field}
1117 # reciprocal - see comments above
1118 : $counts{$perl}{$field} / $t->{$field};
1122 return \%data, \%averages;
1127 # print a standard blurb at the start of the grind display
1137 COND conditional branches
1138 IND indirect branches
1139 _m branch predict miss
1140 _m1 level 1 cache miss
1141 _mm last cache (e.g. L3) miss
1142 - indeterminate percentage (e.g. 1/0)
1147 print "The numbers represent raw counts per loop iteration.\n";
1151 The numbers represent relative counts per loop iteration, compared to
1152 $perls->[$OPTS{norm}][1] at 100.0%.
1153 Higher is better: for example, using half as many instructions gives 200%,
1154 while using twice as many gives 50%.
1160 # return a sorted list of the test names, plus 'AVERAGE'
1162 sub sorted_test_names {
1163 my ($results, $order, $perls) = @_;
1166 unless ($OPTS{average}) {
1167 if (defined $OPTS{'sort-field'}) {
1168 my ($field, $perlix) = @OPTS{'sort-field', 'sort-perl'};
1169 my $perl = $perls->[$perlix][1];
1172 $results->{$a}{$perl}{$field}
1173 <=> $results->{$b}{$perl}{$field}
1178 @names = grep $results->{$_}, @$order;
1182 # No point in displaying average for only one test.
1183 push @names, 'AVERAGE' unless @names == 1;
1188 # grind_print(): display the tabulated results of all the cachegrinds.
1190 # Arguments are of the form:
1191 # $results->{benchmark_name}{perl_name}{field_name} = N
1192 # $averages->{perl_name}{field_name} = M
1193 # $perls = [ [ perl-exe, perl-label ], ... ]
1194 # $tests->{test_name}{desc => ..., ...}
1197 my ($results, $averages, $perls, $tests, $order) = @_;
1199 my @perl_names = map $_->[0], @$perls;
1200 my @perl_labels = map $_->[1], @$perls;
1202 $perl_labels{$_->[0]} = $_->[1] for @$perls;
1204 my $field_label_width = 6;
1205 # Calculate the width to display for each column.
1206 my $min_width = $OPTS{raw} ? 8 : 6;
1207 my @widths = map { length($_) < $min_width ? $min_width : length($_) }
1210 # Print standard header.
1211 grind_blurb($perls);
1213 my @test_names = sorted_test_names($results, $order, $perls);
1215 # If only a single field is to be displayed, use a more compact
1216 # format with only a single line of output per test.
1218 my $one_field = defined $OPTS{fields} && keys(%{$OPTS{fields}}) == 1;
1221 print "Results for field " . (keys(%{$OPTS{fields}}))[0] . ".\n";
1223 # The first column will now contain test names rather than
1224 # field names; Calculate the max width.
1226 $field_label_width = 0;
1228 $field_label_width = length if length > $field_label_width;
1231 # Print the perl executables header.
1235 print " " x $field_label_width;
1237 printf " %*s", $widths[$_],
1238 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1244 # Dump the results for each test.
1246 for my $test_name (@test_names) {
1247 my $doing_ave = ($test_name eq 'AVERAGE');
1248 my $res1 = $doing_ave ? $averages : $results->{$test_name};
1250 unless ($one_field) {
1251 print "\n$test_name";
1252 print "\n$tests->{$test_name}{desc}" unless $doing_ave;
1255 # Print the perl executables header.
1257 print " " x $field_label_width;
1259 printf " %*s", $widths[$_],
1260 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1266 for my $field (qw(Ir Dr Dw COND IND
1275 next if $OPTS{fields} and ! exists $OPTS{fields}{$field};
1277 if ($field eq 'N') {
1283 printf "%-*s", $field_label_width, $test_name;
1286 printf "%*s", $field_label_width, $field;
1289 for my $i (0..$#widths) {
1290 my $res2 = $res1->{$perl_labels[$i]};
1291 my $p = $res2->{$field};
1293 printf " %*s", $widths[$i], '-';
1295 elsif ($OPTS{raw}) {
1296 printf " %*.1f", $widths[$i], $p;
1299 printf " %*.2f", $widths[$i], $p * 100;
1309 # grind_print_compact(): like grind_print(), but display a single perl
1310 # in a compact form. Has an additional arg, $which_perl, which specifies
1311 # which perl to display.
1313 # Arguments are of the form:
1314 # $results->{benchmark_name}{perl_name}{field_name} = N
1315 # $averages->{perl_name}{field_name} = M
1316 # $perls = [ [ perl-exe, perl-label ], ... ]
1317 # $tests->{test_name}{desc => ..., ...}
1319 sub grind_print_compact {
1320 my ($results, $averages, $which_perl, $perls, $tests, $order) = @_;
1323 # the width to display for each column.
1324 my $width = $OPTS{raw} ? 7 : 6;
1326 # Print standard header.
1327 grind_blurb($perls);
1329 print "\nResults for $perls->[$which_perl][1]\n\n";
1331 my @test_names = sorted_test_names($results, $order, $perls);
1333 # Dump the results for each test.
1335 my @fields = qw( Ir Dr Dw
1341 if ($OPTS{fields}) {
1342 @fields = grep exists $OPTS{fields}{$_}, @fields;
1345 printf " %*s", $width, $_ for @fields;
1347 printf " %*s", $width, '------' for @fields;
1350 for my $test_name (@test_names) {
1351 my $doing_ave = ($test_name eq 'AVERAGE');
1352 my $res = $doing_ave ? $averages : $results->{$test_name};
1353 $res = $res->{$perls->[$which_perl][1]};
1355 for my $field (@fields) {
1356 my $p = $res->{$field};
1358 printf " %*s", $width, '-';
1360 elsif ($OPTS{raw}) {
1361 printf " %*.1f", $width, $p;
1364 printf " %*.2f", $width, $p * 100;
1369 print " $test_name\n";
1374 # do_selftest(): check that we can parse known cachegrind()
1375 # output formats. If the output of cachegrind changes, add a *new*
1376 # test here; keep the old tests to make sure we continue to parse
1384 ==32350== Cachegrind, a cache and branch-prediction profiler
1385 ==32350== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
1386 ==32350== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
1387 ==32350== Command: perl5211o /tmp/uiS2gjdqe5 1
1389 --32350-- warning: L3 cache found, using its data for the LL simulation.
1391 ==32350== I refs: 1,124,055
1392 ==32350== I1 misses: 5,573
1393 ==32350== LLi misses: 3,338
1394 ==32350== I1 miss rate: 0.49%
1395 ==32350== LLi miss rate: 0.29%
1397 ==32350== D refs: 404,275 (259,191 rd + 145,084 wr)
1398 ==32350== D1 misses: 9,608 ( 6,098 rd + 3,510 wr)
1399 ==32350== LLd misses: 5,794 ( 2,781 rd + 3,013 wr)
1400 ==32350== D1 miss rate: 2.3% ( 2.3% + 2.4% )
1401 ==32350== LLd miss rate: 1.4% ( 1.0% + 2.0% )
1403 ==32350== LL refs: 15,181 ( 11,671 rd + 3,510 wr)
1404 ==32350== LL misses: 9,132 ( 6,119 rd + 3,013 wr)
1405 ==32350== LL miss rate: 0.5% ( 0.4% + 2.0% )
1407 ==32350== Branches: 202,372 (197,050 cond + 5,322 ind)
1408 ==32350== Mispredicts: 19,153 ( 17,742 cond + 1,411 ind)
1409 ==32350== Mispred rate: 9.4% ( 9.0% + 26.5% )
1429 my $t = "$_/test.pl";
1433 plan(@tests / 3 * keys %VALID_FIELDS);
1436 my $desc = shift @tests;
1437 my $output = shift @tests;
1438 my $expected = shift @tests;
1439 my $p = parse_cachegrind($output);
1440 for (sort keys %VALID_FIELDS) {
1441 is($p->{$_}, $expected->{$_}, "$desc, $_");