3 # A tool for analysing the performance of the code snippets found in
4 # t/perf/benchmarks or similar
9 bench.pl - Compare the performance of perl code snippets across multiple
14 # Basic: run the tests in t/perf/benchmarks against two or
17 bench.pl [options] -- perlA[=labelA] perlB[=labelB] ...
19 # run the tests against same perlA 2x, with and without extra
22 bench.pl [options] -- perlA=fast PerlA=slow -Mstrict -Dpsltoc
24 # Run bench.pl's own built-in sanity tests
26 bench.pl --action=selftest
28 # Run bench on blead, which is then modified and timed again
30 bench.pl [options] --write=blead.time -- ./perl=blead
32 bench.pl --read=blead.time -- ./perl=hacked
34 # You can also combine --read with --write
35 bench.pl --read=blead.time --write=last.time -- ./perl=hacked
39 By default, F<bench.pl> will run code snippets found in
40 F<t/perf/benchmarks> (or similar) under cachegrind, in order to calculate
41 how many instruction reads, data writes, branches, cache misses, etc. that
42 one execution of the snippet uses. It will run them against two or more
43 perl executables and show how much each test has gotten better or worse.
45 It is modelled on the F<perlbench> tool, but since it measures instruction
46 reads etc., rather than timings, it is much more precise and reproducible.
47 It is also considerably faster, and is capable of running tests in
48 parallel (with C<-j>). Rather than displaying a single relative
49 percentage per test/perl combination, it displays values for 13 different
50 measurements, such as instruction reads, conditional branch misses etc.
52 There are options to write the raw data to a file, and to read it back.
53 This means that you can view the same run data in different views with
54 different selection and sort options. You can also use this mechanism
55 to save the results of timing one perl, and then read it back while timing
56 a modification, so that you dont have rerun the same tests on the same
57 perl over and over, or have two perls built at the same time.
59 The optional C<=label> after each perl executable is used in the display
60 output. If you are doing a two step benchmark then you should provide
61 a label for at least the "base" perl.
71 What action to perform. The default is I<grind>, which runs the benchmarks
72 using I<cachegrind> as the back end. The only other action at the moment is
73 I<selftest>, which runs some basic sanity checks and produces TAP output.
79 Only display the overall average, rather than the results for each
86 The path of the file which contains the benchmarks (F<t/perf/benchmarks>
91 --bisect=I<field,minval,maxval>
93 Run a single test against one perl and exit with a zero status if the
94 named field is in the specified range; exit 1 otherwise. It will complain
95 if more than one test or perl has been specified. It is intended to be
96 called as part of a bisect run, to determine when something changed.
99 bench.pl -j 8 --tests=foo --bisect=Ir,100,105 --perlargs=-Ilib \
102 might be called from bisect to find when the number of instruction reads
103 for test I<foo> falls outside the range 100..105.
109 Display the results for a single perl executable in a compact form.
110 Which perl to display is specified in the same manner as C<--norm>.
116 Enable verbose debugging output.
122 Display only the specified fields; for example,
124 --fields=Ir,Ir_m,Ir_mm
126 If only one field is selected, the output is in more compact form.
132 Optional command-line arguments to pass to all cachegrind invocations.
134 This option is appended to those which bench.pl uses for its own
135 purposes; so it can be used to override them (see --debug output
136 below), and can also be 'abused' to add redirects into the valgrind
139 For example, this writes PERL_MEM_LOG activity to foobar.$$, because
140 3>foobar.$$ redirects fd 3, then perl under PERL_MEM_LOG writes to fd 3.
142 $ perl Porting/bench.pl --jobs=2 --verbose --debug \
143 --tests=call::sub::amp_empty \
145 --grindargs='--cachegrind-out-file=junk.$$ 3>foobar.$$' \
147 perl5.24.0 perl5.24.0:+memlog:PERL_MEM_LOG=3mst
149 for the +memlog tests, this executes as: (shown via --debug, then prettyfied)
151 Command: PERL_HASH_SEED=0 PERL_MEM_LOG=3mst
152 valgrind --tool=cachegrind --branch-sim=yes
153 --cachegrind-out-file=/dev/null --cachegrind-out-file=junk.$$
154 3>foobar.$$ perl5.24.0 - 10 2>&1
156 The result is that a set of junk.$$ files containing raw cachegrind
157 output are written, and foobar.$$ contains the expected memlog output.
161 Theres no obvious utility for those junk.$$ and foobar.$$ files, but
162 you can have them anyway.
164 The 3 in PERL_MEM_LOG=3mst is needed because the output would
165 otherwize go to STDERR, and cause parse_cachegrind() to reject the
168 The --grindargs redirect is needed to capture the memlog output;
169 without it, the memlog output is written to fd3, around
170 parse_cachegrind and effectively into /dev/null
172 PERL_MEM_LOG is expensive when used.
175 &foo function call with no args or body
177 perl5.24.0 perl5.24.0+memlog
178 ---------- -----------------
201 Display basic usage information.
208 Run I<N> jobs in parallel (default 1). This determines how many cachegrind
209 process will running at a time, and should generally be set to the number
216 Specify which perl column in the output to treat as the 100% norm.
217 It may be a column number (0..N-1) or a perl executable name or label.
218 It defaults to the leftmost column.
224 Optional command-line arguments to pass to each perl-under-test
225 (perlA, perlB in synopsis) For example, C<--perlargs=-Ilib>.
231 Display raw data counts rather than percentages in the outputs. This
232 allows you to see the exact number of intruction reads, branch misses etc.
233 for each test/perl combination. It also causes the C<AVERAGE> display
234 per field to be calculated based on the average of each tests's count
235 rather than average of each percentage. This means that tests with very
236 high counts will dominate.
242 Order the tests in the output based on the value of I<field> in the
243 column I<perl>. The I<perl> value is as per C<--norm>. For example
245 bench.pl --sort=Dw:perl-5.20.0 \
246 perl-5.16.0 perl-5.18.0 perl-5.20.0
253 Read in saved data from a previous C<--write> run from the specified file.
255 Requires C<JSON::PP> to be available.
261 Specify a subset of tests to run (or in the case of C<--read>, to display).
262 It may be either a comma-separated list of test names, or a regular
263 expression. For example
265 --tests=expr::assign::scalar_lex,expr::assign::2list_lex
272 Display progress information.
279 Save the raw data to the specified file. It can be read back later with
280 C<--read>. If combined with C<--read> then the output file will be
281 the merge of the file read and any additional perls added on the command
284 Requires C<JSON::PP> to be available.
295 use Getopt::Long qw(:config no_auto_abbrev require_order);
299 use POSIX ":sys_wait_h";
301 # The version of the file format used to save data. We refuse to process
302 # the file if the integer component differs.
304 my $FORMAT_VERSION = 1.0;
306 # The fields we know about
308 my %VALID_FIELDS = map { $_ => 1 }
309 qw(Ir Ir_m1 Ir_mm Dr Dr_m1 Dr_mm Dw Dw_m1 Dw_mm COND COND_m IND IND_m);
313 usage: $0 [options] -- perl[=label] ...
314 --action=foo What action to perform [default: grind].
315 --average Only display average, not individual test results.
316 --benchfile=foo File containing the benchmarks;
317 [default: t/perf/benchmarks].
318 --bisect=f,min,max run a single test against one perl and exit with a
319 zero status if the named field is in the specified
320 range; exit 1 otherwise.
321 --compact=perl Display the results of a single perl in compact form.
322 Which perl specified like --norm
323 --debug Enable verbose debugging output.
324 --fields=a,b,c Display only the specified fields (e.g. Ir,Ir_m,Ir_mm).
325 --grindargs=foo Optional command-line args to pass to cachegrind.
326 --help Display this help.
327 -j|--jobs=N Run N jobs in parallel [default 1].
328 --norm=perl Which perl column to treat as 100%; may be a column
329 number (0..N-1) or a perl executable name or label;
331 --perlargs=foo Optional command-line args to pass to each perl to run.
332 --raw Display raw data counts rather than percentages.
333 --show Show results even though we are going to write results.
334 --sort=field:perl Sort the tests based on the value of 'field' in the
335 column 'perl'. The perl value is as per --norm.
336 -r|--read=file Read in previously saved data from the specified file.
337 --tests=FOO Select only the specified tests from the benchmarks file;
338 FOO may be either of the form 'foo,bar' or '/regex/';
339 [default: all tests].
340 --verbose Display progress information.
341 -w|--write=file Save the raw data to the specified file.
344 grind run the code under cachegrind
345 selftest perform a selftest; produce TAP output
347 The command line ends with one or more specified perl executables,
348 which will be searched for in the current \$PATH. Each binary name may
349 have an optional =LABEL appended, which will be used rather than the
350 executable name in output. E.g.
352 perl-5.20.1=PRE-BUGFIX perl-5.20.1-new=POST-BUGFIX
359 benchfile => 't/perf/benchmarks',
378 # process command-line args and call top-level action
382 'action=s' => \$OPTS{action},
383 'average' => \$OPTS{average},
384 'benchfile=s' => \$OPTS{benchfile},
385 'bisect=s' => \$OPTS{bisect},
386 'compact=s' => \$OPTS{compact},
387 'debug' => \$OPTS{debug},
388 'grindargs=s' => \$OPTS{grindargs},
389 'help' => \$OPTS{help},
390 'fields=s' => \$OPTS{fields},
391 'jobs|j=i' => \$OPTS{jobs},
392 'norm=s' => \$OPTS{norm},
393 'perlargs=s' => \$OPTS{perlargs},
394 'raw' => \$OPTS{raw},
395 'read|r=s' => \$OPTS{read},
396 'show!' => \$OPTS{show},
397 'sort=s' => \$OPTS{sort},
398 'tests=s' => \$OPTS{tests},
399 'verbose' => \$OPTS{verbose},
400 'write|w=s' => \$OPTS{write},
403 usage if $OPTS{help};
406 if (defined $OPTS{read} or defined $OPTS{write}) {
407 # fail early if it's not present
411 if (defined $OPTS{fields}) {
412 my @f = split /,/, $OPTS{fields};
414 die "Error: --fields: unknown field '$_'\n"
415 unless $VALID_FIELDS{$_};
417 my %f = map { $_ => 1 } @f;
421 my %valid_actions = qw(grind 1 selftest 1);
422 unless ($valid_actions{$OPTS{action}}) {
423 die "Error: unrecognised action '$OPTS{action}'\n"
424 . "must be one of: " . join(', ', sort keys %valid_actions)."\n";
427 if (defined $OPTS{sort}) {
428 my @s = split /:/, $OPTS{sort};
430 die "Error: --sort argument should be of the form field:perl: "
433 my ($field, $perl) = @s;
434 die "Error: --sort: unknown field '$field\n"
435 unless $VALID_FIELDS{$field};
436 # the 'perl' value will be validated later, after we have processed
438 $OPTS{'sort-field'} = $field;
439 $OPTS{'sort-perl'} = $perl;
442 if ($OPTS{action} eq 'selftest') {
444 die "Error: no perl executables may be specified with --read\n"
447 elsif (defined $OPTS{bisect}) {
448 die "Error: exactly one perl executable must be specified for bisect\n"
450 die "Error: Can't specify both --bisect and --read\n"
451 if defined $OPTS{read};
452 die "Error: Can't specify both --bisect and --write\n"
453 if defined $OPTS{write};
456 if ($OPTS{action} eq 'grind') {
459 elsif ($OPTS{action} eq 'selftest') {
466 # Given a hash ref keyed by test names, filter it by deleting unwanted
467 # tests, based on $OPTS{tests}.
472 my $opt = $OPTS{tests};
473 return unless defined $opt;
478 $opt =~ s{^/(.+)/$}{$1}
479 or die "Error: --tests regex must be of the form /.../\n";
481 delete $tests->{$_} unless /$opt/;
486 for (split /,/, $opt) {
487 die "Error: no such test found: '$_'\n"
488 . ($OPTS{verbose} ? " have: @{[ sort keys %$tests ]}\n" : "")
489 unless exists $tests->{$_};
493 delete $tests->{$_} unless exists $t{$_};
496 die "Error: no tests to run\n" unless %$tests;
500 # Read in the test file, and filter out any tests excluded by $OPTS{tests}
501 # return a hash ref { testname => { test }, ... }
502 # and an array ref of the original test names order,
504 sub read_tests_file {
509 die "Error: can't parse '$file': $@\n" if $@;
510 die "Error: can't read '$file': $!\n";
514 for (my $i=0; $i < @$ta; $i += 2) {
515 push @orig_order, $ta->[$i];
520 return $t, \@orig_order;
524 # Process the perl/column argument of options like --norm and --sort.
525 # Return the index of the matching perl.
528 my ($perl, $perls, $who) = @_;
530 if ($perl =~ /^[0-9]$/) {
531 die "Error: $who value $perl outside range 0.." . $#$perls . "\n"
532 unless $perl < @$perls;
536 my @perl = grep $perls->[$_][0] eq $perl
537 || $perls->[$_][1] eq $perl,
539 die "Error: $who: unrecognised perl '$perl'\n"
541 die "Error: $who: ambiguous perl '$perl'\n"
548 # Validate the list of perl=label (+ cmdline options) on the command line.
549 # Return a list of [ exe, label, cmdline-options ] tuples, i.e.
550 # 'perl-under-test's (PUTs)
553 my $read_perls= shift;
554 my @res_puts; # returned, each item is [ perlexe, label, @putargs ]
555 my %seen= map { $_->[1] => 1 } @$read_perls;
556 my @putargs; # collect not-perls into args per PUT
558 for my $p (reverse @_) {
559 push @putargs, $p and next if $p =~ /^-/; # not-perl, dont send to qx//
561 my ($perl, $label, $env) = split /[=:,]/, $p, 3;
563 $label = $perl.$label if $label =~ /^\+/;
564 die "$label cannot be used on 2 different perls under test\n" if $seen{$label}++;
568 %env = split /[=,]/, $env;
570 my $r = qx($perl -e 'print qq(ok\n)' 2>&1);
572 push @res_puts, [ $perl, $label, \%env, reverse @putargs ];
574 warn "Added Perl-Under-Test: [ @{[@{$res_puts[-1]}]} ]\n"
577 warn "perl-under-test args: @putargs + a not-perl: $p $r\n"
579 push @putargs, $p; # not-perl
582 return reverse @res_puts;
587 # Return a string containing perl test code wrapped in a loop
588 # that runs $ARGV[0] times
591 my ($test, $desc, $setup, $code) = @_;
598 for my \$__loop__ (1..\$ARGV[0]) {
605 # Parse the output from cachegrind. Return a hash ref.
606 # See do_selftest() for examples of the output format.
608 sub parse_cachegrind {
609 my ($output, $id, $perl) = @_;
613 my @lines = split /\n/, $output;
615 unless (s/(==\d+==)|(--\d+--) //) {
616 die "Error: while executing $id:\n"
617 . "unexpected code or cachegrind output:\n$_\n";
619 if (/I refs:\s+([\d,]+)/) {
622 elsif (/I1 misses:\s+([\d,]+)/) {
625 elsif (/LLi misses:\s+([\d,]+)/) {
628 elsif (/D refs:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
629 @res{qw(Dr Dw)} = ($1,$2);
631 elsif (/D1 misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
632 @res{qw(Dr_m1 Dw_m1)} = ($1,$2);
634 elsif (/LLd misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
635 @res{qw(Dr_mm Dw_mm)} = ($1,$2);
637 elsif (/Branches:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
638 @res{qw(COND IND)} = ($1,$2);
640 elsif (/Mispredicts:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
641 @res{qw(COND_m IND_m)} = ($1,$2);
645 for my $field (keys %VALID_FIELDS) {
646 die "Error: can't parse '$field' field from cachegrind output:\n$output"
647 unless exists $res{$field};
648 $res{$field} =~ s/,//g;
655 # Handle the 'grind' action
658 my ($perl_args) = @_; # the residue of @ARGV after option processing
660 my ($loop_counts, $perls, $results, $tests, $order);
661 my ($bisect_field, $bisect_min, $bisect_max);
663 if (defined $OPTS{bisect}) {
664 ($bisect_field, $bisect_min, $bisect_max) = split /,/, $OPTS{bisect}, 3;
665 die "Error: --bisect option must be of form 'field,integer,integer'\n"
668 and $bisect_min =~ /^[0-9]+$/
669 and $bisect_max =~ /^[0-9]+$/;
671 die "Error: unrecognised field '$bisect_field' in --bisect option\n"
672 unless $VALID_FIELDS{$bisect_field};
674 die "Error: --bisect min ($bisect_min) must be <= max ($bisect_max)\n"
675 if $bisect_min > $bisect_max;
679 open my $in, '<:encoding(UTF-8)', $OPTS{read}
680 or die " Error: can't open '$OPTS{read}' for reading: $!\n";
681 my $data = do { local $/; <$in> };
684 my $hash = JSON::PP::decode_json($data);
685 if (int($FORMAT_VERSION) < int($hash->{version})) {
686 die "Error: unsupported version $hash->{version} in file"
687 . "'$OPTS{read}' (too new)\n";
689 ($loop_counts, $perls, $results, $tests, $order) =
690 @$hash{qw(loop_counts perls results tests order)};
692 filter_tests($results);
693 filter_tests($tests);
696 $order = [ sort keys %$tests ];
701 unless ($loop_counts) {
702 # How many times to execute the loop for the two trials. The lower
703 # value is intended to do the loop enough times that branch
704 # prediction has taken hold; the higher loop allows us to see the
705 # branch misses after that
706 $loop_counts = [10, 20];
708 ($tests, $order) = read_tests_file($OPTS{benchfile});
709 die "Error: only a single test may be specified with --bisect\n"
710 if defined $OPTS{bisect} and keys %$tests != 1;
713 my @run_perls= process_puts($perls, @$perl_args);
714 push @$perls, @run_perls;
715 die "Error: Not enough perls to run a report, and --write not specified.\n"
716 if @$perls < 2 and !$OPTS{write};
717 $results = grind_run($tests, $order, \@run_perls, $loop_counts, $results);
720 if (!$perls or !@$perls) {
721 die "Error: nothing to do: no perls to run, no data to read.\n";
723 # now that we have a list of perls, use it to process the
724 # 'perl' component of the --norm and --sort args
726 $OPTS{norm} = select_a_perl($OPTS{norm}, $perls, "--norm");
727 if (defined $OPTS{'sort-perl'}) {
729 select_a_perl($OPTS{'sort-perl'}, $perls, "--sort");
732 if (defined $OPTS{'compact'}) {
734 select_a_perl($OPTS{'compact'}, $perls, "--compact");
736 if (defined $OPTS{write}) {
737 my $json = JSON::PP::encode_json({
738 version => $FORMAT_VERSION,
739 loop_counts => $loop_counts,
746 open my $out, '>:encoding(UTF-8)', $OPTS{write}
747 or die "Error: can't open '$OPTS{write}' for writing: $!\n";
748 print $out $json or die "Error: writing to file '$OPTS{write}': $!\n";
749 close $out or die "Error: closing file '$OPTS{write}': $!\n";
751 if (!$OPTS{write} or $OPTS{show}) {
753 die "Error: need more than one perl to do a report.\n";
755 my ($processed, $averages) =
756 grind_process($results, $perls, $loop_counts);
758 if (defined $OPTS{bisect}) {
759 my @r = values %$results;
760 die "Panic: expected exactly one test result in bisect\n"
762 @r = values %{$r[0]};
763 die "Panic: expected exactly one perl result in bisect\n"
765 my $c = $r[0]{$bisect_field};
766 die "Panic: no result in bisect for field '$bisect_field'\n"
768 exit 0 if $bisect_min <= $c and $c <= $bisect_max;
771 elsif (defined $OPTS{compact}) {
772 grind_print_compact($processed, $averages, $OPTS{compact},
773 $perls, $tests, $order);
776 grind_print($processed, $averages, $perls, $tests, $order);
782 # Run cachegrind for every test/perl combo.
783 # It may run several processes in parallel when -j is specified.
784 # Return a hash ref suitable for input to grind_process()
787 my ($tests, $order, $perls, $counts, $results) = @_;
789 # Build a list of all the jobs to run
793 for my $test (grep $tests->{$_}, @$order) {
795 # Create two test progs: one with an empty loop and one with code.
796 # Note that the empty loop is actually '{1;}' rather than '{}';
797 # this causes the loop to have a single nextstate rather than a
798 # stub op, so more closely matches the active loop; e.g.:
799 # {1;} => nextstate; unstack
800 # {$x=1;} => nextstate; const; gvsv; sassign; unstack
802 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup)}, '1'),
803 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup code)}),
806 for my $p (@$perls) {
807 my ($perl, $label, $env, @putargs) = @$p;
809 # Run both the empty loop and the active loop
810 # $counts->[0] and $counts->[1] times.
816 $envstr .= "$_=$env->{$_} " for sort keys %$env;
818 my $cmd = "PERL_HASH_SEED=0 $envstr"
819 . "valgrind --tool=cachegrind --branch-sim=yes "
820 . "--cachegrind-out-file=/dev/null "
821 . "$OPTS{grindargs} "
822 . "$perl $OPTS{perlargs} @putargs - $counts->[$j] 2>&1";
823 # for debugging and error messages
824 my $id = "$test/$label "
825 . ($i ? "active" : "empty") . "/"
826 . ($j ? "long" : "short") . " loop";
843 # Execute each cachegrind and store the results in %results.
845 local $SIG{PIPE} = 'IGNORE';
847 my $max_jobs = $OPTS{jobs};
848 my $running = 0; # count of executing jobs
849 my %pids; # map pids to jobs
850 my %fds; # map fds to jobs
851 my $select = IO::Select->new();
853 while (@jobs or $running) {
856 printf "Main loop: pending=%d running=%d\n",
857 scalar(@jobs), $running;
862 while (@jobs && $running < $max_jobs) {
863 my $job = shift @jobs;
864 my ($id, $cmd) =@$job{qw(id cmd)};
866 my ($in, $out, $pid);
867 warn "Starting $id\n" if $OPTS{verbose};
868 eval { $pid = IPC::Open2::open2($out, $in, $cmd); 1; }
869 or die "Error: while starting cachegrind subprocess"
874 $job->{out_fd} = $out;
882 print "Started pid $pid for $id\n";
886 # In principle we should write to $in in the main select loop,
887 # since it may block. In reality,
888 # a) the code we write to the perl process's stdin is likely
889 # to be less than the OS's pipe buffer size;
890 # b) by the time the perl process has read in all its stdin,
891 # the only output it should have generated is a few lines
892 # of cachegrind output preamble.
893 # If these assumptions change, then perform the following print
894 # in the select loop instead.
896 print $in $job->{prog};
900 # Get output of running jobs
903 printf "Select: waiting on (%s)\n",
904 join ', ', sort { $a <=> $b } map $fds{$_}{pid},
908 my @ready = $select->can_read;
911 printf "Select: pids (%s) ready\n",
912 join ', ', sort { $a <=> $b } map $fds{$_}{pid}, @ready;
916 die "Panic: select returned no file handles\n";
919 for my $fd (@ready) {
921 my $r = sysread $fd, $j->{output}, 8192, length($j->{output});
922 unless (defined $r) {
923 die "Panic: Read from process running $j->{id} gave:\n$!";
930 print "Got eof for pid $fds{$fd}{pid} ($j->{id})\n";
933 $select->remove($j->{out_fd});
935 or die "Panic: closing output fh on $j->{id} gave:\n$!\n";
937 delete $fds{"$j->{out_fd}"};
938 my $output = $j->{output};
946 print "\n$j->{id}/\nCommand: $j->{cmd}\n"
951 $results->{$j->{test}}{$j->{plabel}}[$j->{active}][$j->{loopix}]
952 = parse_cachegrind($output, $j->{id}, $j->{perl});
958 my $kid = waitpid(-1, WNOHANG);
962 unless (exists $pids{$kid}) {
963 die "Panic: reaped unexpected child $kid";
967 die sprintf("Error: $j->{id} gave return status 0x%04x\n", $ret)
968 . "with the following output\n:$j->{output}\n";
980 # grind_process(): process the data that has been extracted from
981 # cachgegrind's output.
983 # $res is of the form ->{benchmark_name}{perl_name}[active][count]{field_name},
984 # where active is 0 or 1 indicating an empty or active loop,
985 # count is 0 or 1 indicating a short or long loop. E.g.
987 # $res->{'expr::assign::scalar_lex'}{perl-5.21.1}[0][10]{Dw_mm}
989 # The $res data structure is modified in-place by this sub.
991 # $perls is [ [ perl-exe, perl-label], .... ].
993 # $counts is [ N, M ] indicating the counts for the short and long loops.
996 # return \%output, \%averages, where
998 # $output{benchmark_name}{perl_name}{field_name} = N
999 # $averages{perl_name}{field_name} = M
1001 # where N is the raw count ($OPTS{raw}), or count_perl0/count_perlI otherwise;
1002 # M is the average raw count over all tests ($OPTS{raw}), or
1003 # 1/(sum(count_perlI/count_perl0)/num_tests) otherwise.
1006 my ($res, $perls, $counts) = @_;
1008 # Process the four results for each test/perf combo:
1010 # $res->{benchmark_name}{perl_name}[active][count]{field_name} = n
1012 # $res->{benchmark_name}{perl_name}{field_name} = averaged_n
1014 # $r[0][1] - $r[0][0] is the time to do ($counts->[1]-$counts->[0])
1015 # empty loops, eliminating startup time
1016 # $r[1][1] - $r[1][0] is the time to do ($counts->[1]-$counts->[0])
1017 # active loops, eliminating startup time
1018 # (the two startup times may be different because different code
1019 # is being compiled); the difference of the two results above
1020 # divided by the count difference is the time to execute the
1021 # active code once, eliminating both startup and loop overhead.
1023 for my $tests (values %$res) {
1024 for my $r (values %$tests) {
1026 for (keys %{$r->[0][0]}) {
1027 my $n = ( ($r->[1][1]{$_} - $r->[1][0]{$_})
1028 - ($r->[0][1]{$_} - $r->[0][0]{$_})
1029 ) / ($counts->[1] - $counts->[0]);
1040 my $perl_norm = $perls->[$OPTS{norm}][1]; # the label of the reference perl
1042 for my $test_name (keys %$res) {
1043 my $res1 = $res->{$test_name};
1044 my $res2_norm = $res1->{$perl_norm};
1045 for my $perl (keys %$res1) {
1046 my $res2 = $res1->{$perl};
1047 for my $field (keys %$res2) {
1048 my ($p, $q) = ($res2_norm->{$field}, $res2->{$field});
1051 # Avoid annoying '-0.0' displays. Ideally this number
1052 # should never be negative, but fluctuations in
1053 # startup etc can theoretically make this happen
1054 $q = 0 if ($q <= 0 && $q > -0.1);
1055 $totals{$perl}{$field} += $q;
1056 $counts{$perl}{$field}++;
1057 $data{$test_name}{$perl}{$field} = $q;
1061 # $p and $q are notionally integer counts, but
1062 # due to variations in startup etc, it's possible for a
1063 # count which is supposedly zero to be calculated as a
1064 # small positive or negative value.
1065 # In this case, set it to zero. Further below we
1066 # special-case zeros to avoid division by zero errors etc.
1068 $p = 0.0 if $p < 0.01;
1069 $q = 0.0 if $q < 0.01;
1071 if ($p == 0.0 && $q == 0.0) {
1072 # Both perls gave a count of zero, so no change:
1074 $totals{$perl}{$field} += 1;
1075 $counts{$perl}{$field}++;
1076 $data{$test_name}{$perl}{$field} = 1;
1078 elsif ($p == 0.0 || $q == 0.0) {
1079 # If either count is zero, there were too few events
1080 # to give a meaningful ratio (and we will end up with
1081 # division by zero if we try). Mark the result undef,
1082 # indicating that it shouldn't be displayed; and skip
1083 # adding to the average
1084 $data{$test_name}{$perl}{$field} = undef;
1087 # For averages, we record q/p rather than p/q.
1088 # Consider a test where perl_norm took 1000 cycles
1089 # and perlN took 800 cycles. For the individual
1090 # results we display p/q, or 1.25; i.e. a quarter
1091 # quicker. For the averages, we instead sum all
1092 # the 0.8's, which gives the total cycles required to
1093 # execute all tests, with all tests given equal
1094 # weight. Later we reciprocate the final result,
1095 # i.e. 1/(sum(qi/pi)/n)
1097 $totals{$perl}{$field} += $q/$p;
1098 $counts{$perl}{$field}++;
1099 $data{$test_name}{$perl}{$field} = $p/$q;
1105 # Calculate averages based on %totals and %counts accumulated earlier.
1108 for my $perl (keys %totals) {
1109 my $t = $totals{$perl};
1110 for my $field (keys %$t) {
1111 $averages{$perl}{$field} = $OPTS{raw}
1112 ? $t->{$field} / $counts{$perl}{$field}
1113 # reciprocal - see comments above
1114 : $counts{$perl}{$field} / $t->{$field};
1118 return \%data, \%averages;
1123 # print a standard blurb at the start of the grind display
1133 COND conditional branches
1134 IND indirect branches
1135 _m branch predict miss
1136 _m1 level 1 cache miss
1137 _mm last cache (e.g. L3) miss
1138 - indeterminate percentage (e.g. 1/0)
1143 print "The numbers represent raw counts per loop iteration.\n";
1147 The numbers represent relative counts per loop iteration, compared to
1148 $perls->[$OPTS{norm}][1] at 100.0%.
1149 Higher is better: for example, using half as many instructions gives 200%,
1150 while using twice as many gives 50%.
1156 # return a sorted list of the test names, plus 'AVERAGE'
1158 sub sorted_test_names {
1159 my ($results, $order, $perls) = @_;
1162 unless ($OPTS{average}) {
1163 if (defined $OPTS{'sort-field'}) {
1164 my ($field, $perlix) = @OPTS{'sort-field', 'sort-perl'};
1165 my $perl = $perls->[$perlix][1];
1168 $results->{$a}{$perl}{$field}
1169 <=> $results->{$b}{$perl}{$field}
1174 @names = grep $results->{$_}, @$order;
1178 # No point in displaying average for only one test.
1179 push @names, 'AVERAGE' unless @names == 1;
1184 # grind_print(): display the tabulated results of all the cachegrinds.
1186 # Arguments are of the form:
1187 # $results->{benchmark_name}{perl_name}{field_name} = N
1188 # $averages->{perl_name}{field_name} = M
1189 # $perls = [ [ perl-exe, perl-label ], ... ]
1190 # $tests->{test_name}{desc => ..., ...}
1193 my ($results, $averages, $perls, $tests, $order) = @_;
1195 my @perl_names = map $_->[0], @$perls;
1196 my @perl_labels = map $_->[1], @$perls;
1198 $perl_labels{$_->[0]} = $_->[1] for @$perls;
1200 my $field_label_width = 6;
1201 # Calculate the width to display for each column.
1202 my $min_width = $OPTS{raw} ? 8 : 6;
1203 my @widths = map { length($_) < $min_width ? $min_width : length($_) }
1206 # Print standard header.
1207 grind_blurb($perls);
1209 my @test_names = sorted_test_names($results, $order, $perls);
1211 # If only a single field is to be displayed, use a more compact
1212 # format with only a single line of output per test.
1214 my $one_field = defined $OPTS{fields} && keys(%{$OPTS{fields}}) == 1;
1217 print "Results for field " . (keys(%{$OPTS{fields}}))[0] . ".\n";
1219 # The first column will now contain test names rather than
1220 # field names; Calculate the max width.
1222 $field_label_width = 0;
1224 $field_label_width = length if length > $field_label_width;
1227 # Print the perl executables header.
1231 print " " x $field_label_width;
1233 printf " %*s", $widths[$_],
1234 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1240 # Dump the results for each test.
1242 for my $test_name (@test_names) {
1243 my $doing_ave = ($test_name eq 'AVERAGE');
1244 my $res1 = $doing_ave ? $averages : $results->{$test_name};
1246 unless ($one_field) {
1247 print "\n$test_name";
1248 print "\n$tests->{$test_name}{desc}" unless $doing_ave;
1251 # Print the perl executables header.
1253 print " " x $field_label_width;
1255 printf " %*s", $widths[$_],
1256 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1262 for my $field (qw(Ir Dr Dw COND IND
1271 next if $OPTS{fields} and ! exists $OPTS{fields}{$field};
1273 if ($field eq 'N') {
1279 printf "%-*s", $field_label_width, $test_name;
1282 printf "%*s", $field_label_width, $field;
1285 for my $i (0..$#widths) {
1286 my $res2 = $res1->{$perl_labels[$i]};
1287 my $p = $res2->{$field};
1289 printf " %*s", $widths[$i], '-';
1291 elsif ($OPTS{raw}) {
1292 printf " %*.1f", $widths[$i], $p;
1295 printf " %*.2f", $widths[$i], $p * 100;
1305 # grind_print_compact(): like grind_print(), but display a single perl
1306 # in a compact form. Has an additional arg, $which_perl, which specifies
1307 # which perl to display.
1309 # Arguments are of the form:
1310 # $results->{benchmark_name}{perl_name}{field_name} = N
1311 # $averages->{perl_name}{field_name} = M
1312 # $perls = [ [ perl-exe, perl-label ], ... ]
1313 # $tests->{test_name}{desc => ..., ...}
1315 sub grind_print_compact {
1316 my ($results, $averages, $which_perl, $perls, $tests, $order) = @_;
1319 # the width to display for each column.
1320 my $width = $OPTS{raw} ? 7 : 6;
1322 # Print standard header.
1323 grind_blurb($perls);
1325 print "\nResults for $perls->[$which_perl][1]\n\n";
1327 my @test_names = sorted_test_names($results, $order, $perls);
1329 # Dump the results for each test.
1331 my @fields = qw( Ir Dr Dw
1337 if ($OPTS{fields}) {
1338 @fields = grep exists $OPTS{fields}{$_}, @fields;
1341 printf " %*s", $width, $_ for @fields;
1343 printf " %*s", $width, '------' for @fields;
1346 for my $test_name (@test_names) {
1347 my $doing_ave = ($test_name eq 'AVERAGE');
1348 my $res = $doing_ave ? $averages : $results->{$test_name};
1349 $res = $res->{$perls->[$which_perl][1]};
1351 for my $field (@fields) {
1352 my $p = $res->{$field};
1354 printf " %*s", $width, '-';
1356 elsif ($OPTS{raw}) {
1357 printf " %*.1f", $width, $p;
1360 printf " %*.2f", $width, $p * 100;
1365 print " $test_name\n";
1370 # do_selftest(): check that we can parse known cachegrind()
1371 # output formats. If the output of cachegrind changes, add a *new*
1372 # test here; keep the old tests to make sure we continue to parse
1380 ==32350== Cachegrind, a cache and branch-prediction profiler
1381 ==32350== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
1382 ==32350== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
1383 ==32350== Command: perl5211o /tmp/uiS2gjdqe5 1
1385 --32350-- warning: L3 cache found, using its data for the LL simulation.
1387 ==32350== I refs: 1,124,055
1388 ==32350== I1 misses: 5,573
1389 ==32350== LLi misses: 3,338
1390 ==32350== I1 miss rate: 0.49%
1391 ==32350== LLi miss rate: 0.29%
1393 ==32350== D refs: 404,275 (259,191 rd + 145,084 wr)
1394 ==32350== D1 misses: 9,608 ( 6,098 rd + 3,510 wr)
1395 ==32350== LLd misses: 5,794 ( 2,781 rd + 3,013 wr)
1396 ==32350== D1 miss rate: 2.3% ( 2.3% + 2.4% )
1397 ==32350== LLd miss rate: 1.4% ( 1.0% + 2.0% )
1399 ==32350== LL refs: 15,181 ( 11,671 rd + 3,510 wr)
1400 ==32350== LL misses: 9,132 ( 6,119 rd + 3,013 wr)
1401 ==32350== LL miss rate: 0.5% ( 0.4% + 2.0% )
1403 ==32350== Branches: 202,372 (197,050 cond + 5,322 ind)
1404 ==32350== Mispredicts: 19,153 ( 17,742 cond + 1,411 ind)
1405 ==32350== Mispred rate: 9.4% ( 9.0% + 26.5% )
1425 my $t = "$_/test.pl";
1429 plan(@tests / 3 * keys %VALID_FIELDS);
1432 my $desc = shift @tests;
1433 my $output = shift @tests;
1434 my $expected = shift @tests;
1435 my $p = parse_cachegrind($output);
1436 for (sort keys %VALID_FIELDS) {
1437 is($p->{$_}, $expected->{$_}, "$desc, $_");