3 # A tool for analysing the performance of the code snippets found in
4 # t/perf/benchmarks or similar
9 bench.pl - Compare the performance of perl code snippets across multiple
14 # Basic: run the tests in t/perf/benchmarks against two or
17 bench.pl [options] -- perlA[=labelA] perlB[=labelB] ...
19 # run the tests against the same perl twice, with varying options
21 bench.pl [options] -- perlA=bigint -Mbigint perlA=int
23 # Run bench on blead, saving results to file; then modify the blead
24 # binary, and benchmark again, comparing against the saved results
26 bench.pl [options] --write=blead.time -- ./perl=blead
27 # ... hack hack hack, updating ./perl ...
28 bench.pl --read=blead.time -- ./perl=hacked
30 # You can also combine --read with --write and new benchmark runs
32 bench.pl --read=blead.time --write=last.time -- ./perl=hacked
36 By default, F<bench.pl> will run code snippets found in
37 F<t/perf/benchmarks> (or similar) under cachegrind, in order to calculate
38 how many instruction reads, data writes, branches, cache misses, etc. that
39 one execution of the snippet uses. Usually it will run them against two or
40 more perl executables and show how much each test has gotten better or
43 It is modelled on the F<perlbench> tool, but since it measures instruction
44 reads etc., rather than timings, it is much more precise and reproducible.
45 It is also considerably faster, and is capable of running tests in
46 parallel (with C<-j>). Rather than displaying a single relative
47 percentage per test/perl combination, it displays values for 13 different
48 measurements, such as instruction reads, conditional branch misses etc.
50 There are options to write the raw data to a file, and to read it back.
51 This means that you can view the same run data in different views with
52 different selection and sort options. You can also use this mechanism
53 to save the results of timing one perl, and then read it back while timing
54 a modification, so that you don't have rerun the same tests on the same
55 perl over and over, or have two perl executables built at the same time.
57 The optional C<=label> after each perl executable is used in the display
58 output. If you are doing a two step benchmark then you should provide
59 a label for at least the "base" perl.
63 =head2 General options
71 What action to perform. The default is I<grind>, which runs the benchmarks
72 using I<cachegrind> as the back end. The only other action at the moment is
73 I<selftest>, which runs some basic sanity checks and produces TAP output.
79 Enable verbose debugging output.
85 Display basic usage information.
91 Display progress information.
95 =head2 Test selection options
103 Specify a subset of tests to run (or in the case of C<--read>, to read).
104 It may be either a comma-separated list of test names, or a regular
105 expression. For example
107 --tests=expr::assign::scalar_lex,expr::assign::2list_lex
123 Read in saved data from a previous C<--write> run from the specified file.
124 If C<--tests> is present too, then only tests matching those conditions
125 are read from the file.
127 C<--read> may be specified multiple times, in which case the results
128 across all files are aggregated. The list of test names from each file
129 (after filtering by C<--tests>) must be identical across all files.
131 This list of tests is used instead of that obtained from the normal
132 benchmark file (or C<--benchfile>) for any benchmarks that are run.
134 Requires C<JSON::PP> to be available.
138 =head2 Benchmarking options
140 Benchmarks will be run for all perls specified on the command line.
141 These options can be used to modify the benchmarking behavior:
149 The path of the file which contains the benchmarks (F<t/perf/benchmarks>
156 Optional command-line arguments to pass to all cachegrind invocations.
158 This option is appended to those which bench.pl uses for its own
159 purposes; so it can be used to override them (see --debug output
160 below), and can also be 'abused' to add redirects into the valgrind
163 For example, this writes PERL_MEM_LOG activity to foobar.$$, because
164 3>foobar.$$ redirects fd 3, then perl under PERL_MEM_LOG writes to fd 3.
166 $ perl Porting/bench.pl --jobs=2 --verbose --debug \
167 --tests=call::sub::amp_empty \
169 --grindargs='--cachegrind-out-file=junk.$$ 3>foobar.$$' \
171 perl5.24.0 perl5.24.0:+memlog:PERL_MEM_LOG=3mst
173 for the +memlog tests, this executes as: (shown via --debug, then prettyfied)
175 Command: PERL_HASH_SEED=0 PERL_MEM_LOG=3mst
176 valgrind --tool=cachegrind --branch-sim=yes
177 --cachegrind-out-file=/dev/null --cachegrind-out-file=junk.$$
178 3>foobar.$$ perl5.24.0 - 10 2>&1
180 The result is that a set of junk.$$ files containing raw cachegrind
181 output are written, and foobar.$$ contains the expected memlog output.
185 Theres no obvious utility for those junk.$$ and foobar.$$ files, but
186 you can have them anyway.
188 The 3 in PERL_MEM_LOG=3mst is needed because the output would
189 otherwize go to STDERR, and cause parse_cachegrind() to reject the
192 The --grindargs redirect is needed to capture the memlog output;
193 without it, the memlog output is written to fd3, around
194 parse_cachegrind and effectively into /dev/null
196 PERL_MEM_LOG is expensive when used.
199 &foo function call with no args or body
201 perl5.24.0 perl5.24.0+memlog
202 ---------- -----------------
225 Run I<N> jobs in parallel (default 1). This determines how many cachegrind
226 process will running at a time, and should generally be set to the number
233 Optional command-line arguments to pass to each perl-under-test
234 (perlA, perlB in synopsis) For example, C<--perlargs=-Ilib>.
238 =head2 Output options
240 Any results accumulated via --read or running benchmarks can be output
241 in any or all of these three ways:
250 Save the raw data to the specified file. It can be read back later with
251 C<--read>. If combined with C<--read> then the output file will be
252 the merge of the file read and any additional perls added on the command
255 Requires C<JSON::PP> to be available.
259 --bisect=I<field,minval,maxval>
261 Run a single test against one perl and exit with a zero status if the
262 named field is in the specified range; exit 1 otherwise. It will complain
263 if more than one test or perl has been specified. It is intended to be
264 called as part of a bisect run, to determine when something changed.
267 bench.pl -j 8 --tests=foo --bisect=Ir,100,105 --perlargs=-Ilib \
270 might be called from bisect to find when the number of instruction reads
271 for test I<foo> falls outside the range 100..105.
277 Display the results to stdout in human-readable form. This is enabled by
278 default, except with --write and --bisect. The following sub-options alter
287 Only display the overall average, rather than the results for each
294 Display the results for a single perl executable in a compact form.
295 Which perl to display is specified in the same manner as C<--norm>.
301 Display only the specified fields; for example,
303 --fields=Ir,Ir_m,Ir_mm
305 If only one field is selected, the output is in more compact form.
311 Specify which perl column in the output to treat as the 100% norm.
312 It may be a column number (0..N-1) or a perl executable name or label.
313 It defaults to the leftmost column.
319 Display raw data counts rather than percentages in the outputs. This
320 allows you to see the exact number of intruction reads, branch misses etc.
321 for each test/perl combination. It also causes the C<AVERAGE> display
322 per field to be calculated based on the average of each tests's count
323 rather than average of each percentage. This means that tests with very
324 high counts will dominate.
330 Order the tests in the output based on the value of I<field> in the
331 column I<perl>. The I<perl> value is as per C<--norm>. For example
333 bench.pl --sort=Dw:perl-5.20.0 \
334 perl-5.16.0 perl-5.18.0 perl-5.20.0
347 use Getopt::Long qw(:config no_auto_abbrev require_order);
351 use POSIX ":sys_wait_h";
353 # The version of the file format used to save data. We refuse to process
354 # the file if the integer component differs.
356 my $FORMAT_VERSION = 1.0;
358 # The fields we know about
360 my %VALID_FIELDS = map { $_ => 1 }
361 qw(Ir Ir_m1 Ir_mm Dr Dr_m1 Dr_mm Dw Dw_m1 Dw_mm COND COND_m IND IND_m);
365 Usage: $0 [options] -- perl[=label] ...
369 --action=foo What action to perform [default: grind]:
370 grind run the code under cachegrind
371 selftest perform a selftest; produce TAP output
372 --debug Enable verbose debugging output.
373 --help Display this help.
374 --verbose Display progress information.
379 --tests=FOO Select only the specified tests for reading, benchmarking
380 and display. FOO may be either a list of tests or
381 a pattern: 'foo,bar,baz' or '/regex/';
382 [default: all tests].
386 -r|--read=file Read in previously saved data from the specified file.
387 May be repeated, and be used together with new
388 benchmarking to create combined results.
391 Benchmarks will be run for any perl specified on the command line.
392 These options can be used to modify the benchmarking behavior:
394 --benchfile=foo File containing the benchmarks.
395 [default: t/perf/benchmarks].
396 --grindargs=foo Optional command-line args to pass to cachegrind.
397 -j|--jobs=N Run N jobs in parallel [default 1].
398 --perlargs=foo Optional command-line args to pass to each perl to run.
401 Any results accumulated via --read or running benchmarks can be output
402 in any or all of these three ways:
404 -w|--write=file Save the raw data to the specified file (may be read
405 back later with --read).
407 --bisect=f,min,max Exit with a zero status if the named field f is in
408 the specified min..max range; exit 1 otherwise.
409 Produces no other output. Only legal if a single
410 benchmark test has been specified.
412 --show Display the results to stdout in human-readable form.
413 This is enabled by default, except with --write and
414 --bisect. The following sub-options alter how
417 --average Only display average, not individual test results.
418 --compact=perl Display the results of a single perl in compact form.
419 Which perl specified like --norm
420 --fields=a,b,c Display only the specified fields (e.g. Ir,Ir_m,Ir_mm).
421 --norm=perl Which perl column to treat as 100%; may be a column
422 number (0..N-1) or a perl executable name or label;
424 --raw Display raw data counts rather than percentages.
425 --sort=field:perl Sort the tests based on the value of 'field' in the
426 column 'perl'. The perl value is as per --norm.
429 The command line ends with one or more specified perl executables,
430 which will be searched for in the current \$PATH. Each binary name may
431 have an optional =LABEL appended, which will be used rather than the
432 executable name in output. E.g.
434 perl-5.20.1=PRE-BUGFIX perl-5.20.1-new=POST-BUGFIX
460 # process command-line args and call top-level action
464 'action=s' => \$OPTS{action},
465 'average' => \$OPTS{average},
466 'benchfile=s' => \$OPTS{benchfile},
467 'bisect=s' => \$OPTS{bisect},
468 'compact=s' => \$OPTS{compact},
469 'debug' => \$OPTS{debug},
470 'grindargs=s' => \$OPTS{grindargs},
471 'help|h' => \$OPTS{help},
472 'fields=s' => \$OPTS{fields},
473 'jobs|j=i' => \$OPTS{jobs},
474 'norm=s' => \$OPTS{norm},
475 'perlargs=s' => \$OPTS{perlargs},
476 'raw' => \$OPTS{raw},
477 'read|r=s@' => \$OPTS{read},
478 'show' => \$OPTS{show},
479 'sort=s' => \$OPTS{sort},
480 'tests=s' => \$OPTS{tests},
481 'verbose' => \$OPTS{verbose},
482 'write|w=s' => \$OPTS{write},
483 ) or die "Use the -h option for usage information.\n";
485 usage if $OPTS{help};
488 if (defined $OPTS{read} or defined $OPTS{write}) {
489 # fail early if it's not present
493 if (defined $OPTS{fields}) {
494 my @f = split /,/, $OPTS{fields};
496 die "Error: --fields: unknown field '$_'\n"
497 unless $VALID_FIELDS{$_};
499 my %f = map { $_ => 1 } @f;
503 my %valid_actions = qw(grind 1 selftest 1);
504 unless ($valid_actions{$OPTS{action}}) {
505 die "Error: unrecognised action '$OPTS{action}'\n"
506 . "must be one of: " . join(', ', sort keys %valid_actions)."\n";
509 if (defined $OPTS{sort}) {
510 my @s = split /:/, $OPTS{sort};
512 die "Error: --sort argument should be of the form field:perl: "
515 my ($field, $perl) = @s;
516 die "Error: --sort: unknown field '$field'\n"
517 unless $VALID_FIELDS{$field};
518 # the 'perl' value will be validated later, after we have processed
520 $OPTS{'sort-field'} = $field;
521 $OPTS{'sort-perl'} = $perl;
524 # show is the default output action
525 $OPTS{show} = 1 unless $OPTS{write} || $OPTS{bisect};
527 if ($OPTS{action} eq 'grind') {
530 elsif ($OPTS{action} eq 'selftest') {
532 die "Error: no perl executables may be specified with selftest\n"
540 # Given a hash ref keyed by test names, filter it by deleting unwanted
541 # tests, based on $OPTS{tests}.
546 my $opt = $OPTS{tests};
547 return unless defined $opt;
552 $opt =~ s{^/(.+)/$}{$1}
553 or die "Error: --tests regex must be of the form /.../\n";
555 delete $tests->{$_} unless /$opt/;
560 for (split /,/, $opt) {
561 die "Error: no such test found: '$_'\n"
562 . ($OPTS{verbose} ? " have: @{[ sort keys %$tests ]}\n" : "")
563 unless exists $tests->{$_};
567 delete $tests->{$_} unless exists $t{$_};
570 die "Error: no tests to run\n" unless %$tests;
574 # Read in the test file, and filter out any tests excluded by $OPTS{tests}
575 # return a hash ref { testname => { test }, ... }
576 # and an array ref of the original test names order,
578 sub read_tests_file {
587 die "Error: can't load '$file': code didn't return a true value\n"
589 die "Error: can't parse '$file':\n$@\n" if $@;
590 die "Error: can't read '$file': $!\n";
594 for (my $i=0; $i < @$ta; $i += 2) {
595 push @orig_order, $ta->[$i];
600 return $t, \@orig_order;
604 # Process the perl name/label/column argument of options like --norm and
605 # --sort. Return the index of the matching perl.
608 my ($perl, $perls, $who) = @_;
610 if ($perl =~ /^[0-9]$/) {
611 die "Error: $who value $perl outside range 0.." . $#$perls . "\n"
612 unless $perl < @$perls;
616 my @perl = grep $perls->[$_][0] eq $perl
617 || $perls->[$_][1] eq $perl,
619 die "Error: $who: unrecognised perl '$perl'\n"
621 die "Error: $who: ambiguous perl '$perl'\n"
628 # Validate the list of perl=label (+ cmdline options) on the command line.
629 # Return a list of [ exe, label, cmdline-options ] tuples, i.e.
630 # 'perl-under-test's (PUTs)
633 my $read_perls= shift;
634 my @res_puts; # returned, each item is [ perlexe, label, @putargs ]
635 my %seen= map { $_->[1] => 1 } @$read_perls;
636 my @putargs; # collect not-perls into args per PUT
638 for my $p (reverse @_) {
639 push @putargs, $p and next if $p =~ /^-/; # not-perl, dont send to qx//
641 my ($perl, $label, $env) = split /[=:,]/, $p, 3;
643 $label = $perl.$label if $label =~ /^\+/;
644 die "$label cannot be used on 2 different perls under test\n" if $seen{$label}++;
648 %env = split /[=,]/, $env;
650 my $r = qx($perl -e 'print qq(ok\n)' 2>&1);
652 push @res_puts, [ $perl, $label, \%env, reverse @putargs ];
654 warn "Added Perl-Under-Test: [ @{[@{$res_puts[-1]}]} ]\n"
657 warn "perl-under-test args: @putargs + a not-perl: $p $r\n"
659 push @putargs, $p; # not-perl
662 return reverse @res_puts;
667 # Return a string containing perl test code wrapped in a loop
668 # that runs $ARGV[0] times
671 my ($test, $desc, $setup, $code) = @_;
678 for my \$__loop__ (1..\$ARGV[0]) {
685 # Parse the output from cachegrind. Return a hash ref.
686 # See do_selftest() for examples of the output format.
688 sub parse_cachegrind {
689 my ($output, $id, $perl) = @_;
693 my @lines = split /\n/, $output;
695 unless (s/(==\d+==)|(--\d+--) //) {
696 die "Error: while executing $id:\n"
697 . "unexpected code or cachegrind output:\n$_\n";
699 if (/I refs:\s+([\d,]+)/) {
702 elsif (/I1 misses:\s+([\d,]+)/) {
705 elsif (/LLi misses:\s+([\d,]+)/) {
708 elsif (/D refs:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
709 @res{qw(Dr Dw)} = ($1,$2);
711 elsif (/D1 misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
712 @res{qw(Dr_m1 Dw_m1)} = ($1,$2);
714 elsif (/LLd misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
715 @res{qw(Dr_mm Dw_mm)} = ($1,$2);
717 elsif (/Branches:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
718 @res{qw(COND IND)} = ($1,$2);
720 elsif (/Mispredicts:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
721 @res{qw(COND_m IND_m)} = ($1,$2);
725 for my $field (keys %VALID_FIELDS) {
726 die "Error: can't parse '$field' field from cachegrind output:\n$output"
727 unless exists $res{$field};
728 $res{$field} =~ s/,//g;
735 # Handle the 'grind' action
738 my ($perl_args) = @_; # the residue of @ARGV after option processing
740 my ($loop_counts, $perls, $results, $tests, $order, @run_perls);
741 my ($bisect_field, $bisect_min, $bisect_max);
742 my ($done_read, $processed, $averages);
744 if (defined $OPTS{bisect}) {
745 ($bisect_field, $bisect_min, $bisect_max) = split /,/, $OPTS{bisect}, 3;
746 die "Error: --bisect option must be of form 'field,integer,integer'\n"
749 and $bisect_min =~ /^[0-9]+$/
750 and $bisect_max =~ /^[0-9]+$/;
752 die "Error: unrecognised field '$bisect_field' in --bisect option\n"
753 unless $VALID_FIELDS{$bisect_field};
755 die "Error: --bisect min ($bisect_min) must be <= max ($bisect_max)\n"
756 if $bisect_min > $bisect_max;
759 # Read in previous benchmark results
761 foreach my $file (@{$OPTS{read}}) {
762 open my $in, '<:encoding(UTF-8)', $file
763 or die "Error: can't open '$file' for reading: $!\n";
764 my $data = do { local $/; <$in> };
767 my $hash = JSON::PP::decode_json($data);
768 if (int($FORMAT_VERSION) < int($hash->{version})) {
769 die "Error: unsupported version $hash->{version} in file"
770 . " '$file' (too new)\n";
772 my ($read_loop_counts, $read_perls, $read_results, $read_tests, $read_order) =
773 @$hash{qw(loop_counts perls results tests order)};
775 # check file contents for consistency
776 my $k_o = join ';', sort @$read_order;
777 my $k_r = join ';', sort keys %$read_results;
778 my $k_t = join ';', sort keys %$read_tests;
779 die "File '$file' contains no results\n" unless length $k_r;
780 die "File '$file' contains differing test and results names\n"
782 die "File '$file' contains differing test and sort order names\n"
785 # delete tests not matching --tests= criteria, if any
786 filter_tests($read_results);
787 filter_tests($read_tests);
790 ($loop_counts, $perls, $results, $tests, $order) =
791 ($read_loop_counts, $read_perls, $read_results, $read_tests, $read_order);
795 # merge results across multiple files
797 if ( join(';', sort keys %$tests)
798 ne join(';', sort keys %$read_tests))
800 my $err = "Can't merge multiple read files: "
801 . "they contain differing test sets.\n";
802 if ($OPTS{verbose}) {
803 $err .= "Previous tests:\n";
804 $err .= " $_\n" for sort keys %$tests;
805 $err .= "tests from '$file':\n";
806 $err .= " $_\n" for sort keys %$read_tests;
809 $err .= "Re-run with --verbose to see the differences.\n";
814 if ("@$read_loop_counts" ne "@$loop_counts") {
815 die "Can't merge multiple read files: differing loop counts:\n"
816 . " (previous=(@$loop_counts), "
817 . "'$file'=(@$read_loop_counts))\n";
820 push @$perls, @{$hash->{perls}};
821 foreach my $test (keys %{$hash->{results}}) {
822 foreach my $perl (keys %{$hash->{results}{$test}}) {
823 $results->{$test}{$perl}= $hash->{results}{$test}{$perl};
828 die "Error: --benchfile cannot be used when --read is present\n"
829 if $done_read && defined $OPTS{benchfile};
831 # Gather list of perls to benchmark:
834 unless ($done_read) {
835 # How many times to execute the loop for the two trials. The lower
836 # value is intended to do the loop enough times that branch
837 # prediction has taken hold; the higher loop allows us to see the
838 # branch misses after that
839 $loop_counts = [10, 20];
842 read_tests_file($OPTS{benchfile} // 't/perf/benchmarks');
845 @run_perls = process_puts($perls, @$perl_args);
846 push @$perls, @run_perls;
849 # strip @$order to just the actual tests present
850 $order = [ grep exists $tests->{$_}, @$order ];
852 # Now we know what perls and tests we have, do extra option processing
853 # and checking (done before grinding, so time isn't wasted if we die).
855 if (!$perls or !@$perls) {
856 die "Error: nothing to do: no perls to run, no data to read.\n";
858 if (@$perls < 2 and $OPTS{show} and !$OPTS{raw}) {
859 die "Error: need at least 2 perls for comparison.\n"
863 die "Error: exactly one perl executable must be specified for bisect\n"
865 die "Error: only a single test may be specified with --bisect\n"
866 unless keys %$tests == 1;
869 $OPTS{norm} = select_a_perl($OPTS{norm}, $perls, "--norm");
871 if (defined $OPTS{'sort-perl'}) {
873 select_a_perl($OPTS{'sort-perl'}, $perls, "--sort");
876 if (defined $OPTS{'compact'}) {
878 select_a_perl($OPTS{'compact'}, $perls, "--compact");
882 # Run the benchmarks; accumulate with any previously read # results.
885 $results = grind_run($tests, $order, \@run_perls, $loop_counts, $results);
889 # Handle the 3 forms of output
891 if (defined $OPTS{write}) {
892 my $json = JSON::PP::encode_json({
893 version => $FORMAT_VERSION,
894 loop_counts => $loop_counts,
901 open my $out, '>:encoding(UTF-8)', $OPTS{write}
902 or die "Error: can't open '$OPTS{write}' for writing: $!\n";
903 print $out $json or die "Error: writing to file '$OPTS{write}': $!\n";
904 close $out or die "Error: closing file '$OPTS{write}': $!\n";
907 if ($OPTS{show} or $OPTS{bisect}) {
908 # numerically process the raw data
909 ($processed, $averages) =
910 grind_process($results, $perls, $loop_counts);
914 if (defined $OPTS{compact}) {
915 grind_print_compact($processed, $averages, $OPTS{compact},
916 $perls, $tests, $order);
919 grind_print($processed, $averages, $perls, $tests, $order);
924 # these panics shouldn't happen if the bisect checks above are sound
925 my @r = values %$results;
926 die "Panic: expected exactly one test result in bisect\n"
928 @r = values %{$r[0]};
929 die "Panic: expected exactly one perl result in bisect\n"
931 my $c = $r[0]{$bisect_field};
932 die "Panic: no result in bisect for field '$bisect_field'\n"
935 exit 0 if $bisect_min <= $c and $c <= $bisect_max;
941 # Run cachegrind for every test/perl combo.
942 # It may run several processes in parallel when -j is specified.
943 # Return a hash ref suitable for input to grind_process()
946 my ($tests, $order, $perls, $counts, $results) = @_;
948 # Build a list of all the jobs to run
952 for my $test (grep $tests->{$_}, @$order) {
954 # Create two test progs: one with an empty loop and one with code.
955 # Note that the empty loop is actually '{1;}' rather than '{}';
956 # this causes the loop to have a single nextstate rather than a
957 # stub op, so more closely matches the active loop; e.g.:
958 # {1;} => nextstate; unstack
959 # {$x=1;} => nextstate; const; gvsv; sassign; unstack
961 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup)}, '1'),
962 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup code)}),
965 for my $p (@$perls) {
966 my ($perl, $label, $env, @putargs) = @$p;
968 # Run both the empty loop and the active loop
969 # $counts->[0] and $counts->[1] times.
975 $envstr .= "$_=$env->{$_} " for sort keys %$env;
977 my $cmd = "PERL_HASH_SEED=0 $envstr"
978 . "valgrind --tool=cachegrind --branch-sim=yes "
979 . "--cachegrind-out-file=/dev/null "
980 . "$OPTS{grindargs} "
981 . "$perl $OPTS{perlargs} @putargs - $counts->[$j] 2>&1";
982 # for debugging and error messages
983 my $id = "$test/$label "
984 . ($i ? "active" : "empty") . "/"
985 . ($j ? "long" : "short") . " loop";
1002 # Execute each cachegrind and store the results in %results.
1004 local $SIG{PIPE} = 'IGNORE';
1006 my $max_jobs = $OPTS{jobs};
1007 my $running = 0; # count of executing jobs
1008 my %pids; # map pids to jobs
1009 my %fds; # map fds to jobs
1010 my $select = IO::Select->new();
1012 while (@jobs or $running) {
1015 printf "Main loop: pending=%d running=%d\n",
1016 scalar(@jobs), $running;
1021 while (@jobs && $running < $max_jobs) {
1022 my $job = shift @jobs;
1023 my ($id, $cmd) =@$job{qw(id cmd)};
1025 my ($in, $out, $pid);
1026 warn "Starting $id\n" if $OPTS{verbose};
1027 eval { $pid = IPC::Open2::open2($out, $in, $cmd); 1; }
1028 or die "Error: while starting cachegrind subprocess"
1032 $fds{"$out"} = $job;
1033 $job->{out_fd} = $out;
1034 $job->{output} = '';
1041 print "Started pid $pid for $id\n";
1045 # In principle we should write to $in in the main select loop,
1046 # since it may block. In reality,
1047 # a) the code we write to the perl process's stdin is likely
1048 # to be less than the OS's pipe buffer size;
1049 # b) by the time the perl process has read in all its stdin,
1050 # the only output it should have generated is a few lines
1051 # of cachegrind output preamble.
1052 # If these assumptions change, then perform the following print
1053 # in the select loop instead.
1055 print $in $job->{prog};
1059 # Get output of running jobs
1062 printf "Select: waiting on (%s)\n",
1063 join ', ', sort { $a <=> $b } map $fds{$_}{pid},
1067 my @ready = $select->can_read;
1070 printf "Select: pids (%s) ready\n",
1071 join ', ', sort { $a <=> $b } map $fds{$_}{pid}, @ready;
1075 die "Panic: select returned no file handles\n";
1078 for my $fd (@ready) {
1079 my $j = $fds{"$fd"};
1080 my $r = sysread $fd, $j->{output}, 8192, length($j->{output});
1081 unless (defined $r) {
1082 die "Panic: Read from process running $j->{id} gave:\n$!";
1089 print "Got eof for pid $fds{$fd}{pid} ($j->{id})\n";
1092 $select->remove($j->{out_fd});
1094 or die "Panic: closing output fh on $j->{id} gave:\n$!\n";
1096 delete $fds{"$j->{out_fd}"};
1097 my $output = $j->{output};
1105 print "\n$j->{id}/\nCommand: $j->{cmd}\n"
1110 $results->{$j->{test}}{$j->{plabel}}[$j->{active}][$j->{loopix}]
1111 = parse_cachegrind($output, $j->{id}, $j->{perl});
1114 # Reap finished jobs
1117 my $kid = waitpid(-1, WNOHANG);
1121 unless (exists $pids{$kid}) {
1122 die "Panic: reaped unexpected child $kid";
1124 my $j = $pids{$kid};
1126 die sprintf("Error: $j->{id} gave return status 0x%04x\n", $ret)
1127 . "with the following output\n:$j->{output}\n";
1139 # grind_process(): process the data that has been extracted from
1140 # cachgegrind's output.
1142 # $res is of the form ->{benchmark_name}{perl_name}[active][count]{field_name},
1143 # where active is 0 or 1 indicating an empty or active loop,
1144 # count is 0 or 1 indicating a short or long loop. E.g.
1146 # $res->{'expr::assign::scalar_lex'}{perl-5.21.1}[0][10]{Dw_mm}
1148 # The $res data structure is modified in-place by this sub.
1150 # $perls is [ [ perl-exe, perl-label], .... ].
1152 # $counts is [ N, M ] indicating the counts for the short and long loops.
1155 # return \%output, \%averages, where
1157 # $output{benchmark_name}{perl_name}{field_name} = N
1158 # $averages{perl_name}{field_name} = M
1160 # where N is the raw count ($OPTS{raw}), or count_perl0/count_perlI otherwise;
1161 # M is the average raw count over all tests ($OPTS{raw}), or
1162 # 1/(sum(count_perlI/count_perl0)/num_tests) otherwise.
1165 my ($res, $perls, $counts) = @_;
1167 # Process the four results for each test/perf combo:
1169 # $res->{benchmark_name}{perl_name}[active][count]{field_name} = n
1171 # $res->{benchmark_name}{perl_name}{field_name} = averaged_n
1173 # $r[0][1] - $r[0][0] is the time to do ($counts->[1]-$counts->[0])
1174 # empty loops, eliminating startup time
1175 # $r[1][1] - $r[1][0] is the time to do ($counts->[1]-$counts->[0])
1176 # active loops, eliminating startup time
1177 # (the two startup times may be different because different code
1178 # is being compiled); the difference of the two results above
1179 # divided by the count difference is the time to execute the
1180 # active code once, eliminating both startup and loop overhead.
1182 for my $tests (values %$res) {
1183 for my $r (values %$tests) {
1185 for (keys %{$r->[0][0]}) {
1186 my $n = ( ($r->[1][1]{$_} - $r->[1][0]{$_})
1187 - ($r->[0][1]{$_} - $r->[0][0]{$_})
1188 ) / ($counts->[1] - $counts->[0]);
1199 my $perl_norm = $perls->[$OPTS{norm}][1]; # the label of the reference perl
1201 for my $test_name (keys %$res) {
1202 my $res1 = $res->{$test_name};
1203 my $res2_norm = $res1->{$perl_norm};
1204 for my $perl (keys %$res1) {
1205 my $res2 = $res1->{$perl};
1206 for my $field (keys %$res2) {
1207 my ($p, $q) = ($res2_norm->{$field}, $res2->{$field});
1210 # Avoid annoying '-0.0' displays. Ideally this number
1211 # should never be negative, but fluctuations in
1212 # startup etc can theoretically make this happen
1213 $q = 0 if ($q <= 0 && $q > -0.1);
1214 $totals{$perl}{$field} += $q;
1215 $counts{$perl}{$field}++;
1216 $data{$test_name}{$perl}{$field} = $q;
1220 # $p and $q are notionally integer counts, but
1221 # due to variations in startup etc, it's possible for a
1222 # count which is supposedly zero to be calculated as a
1223 # small positive or negative value.
1224 # In this case, set it to zero. Further below we
1225 # special-case zeros to avoid division by zero errors etc.
1227 $p = 0.0 if $p < 0.01;
1228 $q = 0.0 if $q < 0.01;
1230 if ($p == 0.0 && $q == 0.0) {
1231 # Both perls gave a count of zero, so no change:
1233 $totals{$perl}{$field} += 1;
1234 $counts{$perl}{$field}++;
1235 $data{$test_name}{$perl}{$field} = 1;
1237 elsif ($p == 0.0 || $q == 0.0) {
1238 # If either count is zero, there were too few events
1239 # to give a meaningful ratio (and we will end up with
1240 # division by zero if we try). Mark the result undef,
1241 # indicating that it shouldn't be displayed; and skip
1242 # adding to the average
1243 $data{$test_name}{$perl}{$field} = undef;
1246 # For averages, we record q/p rather than p/q.
1247 # Consider a test where perl_norm took 1000 cycles
1248 # and perlN took 800 cycles. For the individual
1249 # results we display p/q, or 1.25; i.e. a quarter
1250 # quicker. For the averages, we instead sum all
1251 # the 0.8's, which gives the total cycles required to
1252 # execute all tests, with all tests given equal
1253 # weight. Later we reciprocate the final result,
1254 # i.e. 1/(sum(qi/pi)/n)
1256 $totals{$perl}{$field} += $q/$p;
1257 $counts{$perl}{$field}++;
1258 $data{$test_name}{$perl}{$field} = $p/$q;
1264 # Calculate averages based on %totals and %counts accumulated earlier.
1267 for my $perl (keys %totals) {
1268 my $t = $totals{$perl};
1269 for my $field (keys %$t) {
1270 $averages{$perl}{$field} = $OPTS{raw}
1271 ? $t->{$field} / $counts{$perl}{$field}
1272 # reciprocal - see comments above
1273 : $counts{$perl}{$field} / $t->{$field};
1277 return \%data, \%averages;
1282 # print a standard blurb at the start of the grind display
1292 COND conditional branches
1293 IND indirect branches
1294 _m branch predict miss
1295 _m1 level 1 cache miss
1296 _mm last cache (e.g. L3) miss
1297 - indeterminate percentage (e.g. 1/0)
1302 print "The numbers represent raw counts per loop iteration.\n";
1306 The numbers represent relative counts per loop iteration, compared to
1307 $perls->[$OPTS{norm}][1] at 100.0%.
1308 Higher is better: for example, using half as many instructions gives 200%,
1309 while using twice as many gives 50%.
1315 # return a sorted list of the test names, plus 'AVERAGE'
1317 sub sorted_test_names {
1318 my ($results, $order, $perls) = @_;
1321 unless ($OPTS{average}) {
1322 if (defined $OPTS{'sort-field'}) {
1323 my ($field, $perlix) = @OPTS{'sort-field', 'sort-perl'};
1324 my $perl = $perls->[$perlix][1];
1327 $results->{$a}{$perl}{$field}
1328 <=> $results->{$b}{$perl}{$field}
1333 @names = grep $results->{$_}, @$order;
1337 # No point in displaying average for only one test.
1338 push @names, 'AVERAGE' unless @names == 1;
1343 # grind_print(): display the tabulated results of all the cachegrinds.
1345 # Arguments are of the form:
1346 # $results->{benchmark_name}{perl_name}{field_name} = N
1347 # $averages->{perl_name}{field_name} = M
1348 # $perls = [ [ perl-exe, perl-label ], ... ]
1349 # $tests->{test_name}{desc => ..., ...}
1352 my ($results, $averages, $perls, $tests, $order) = @_;
1354 my @perl_names = map $_->[0], @$perls;
1355 my @perl_labels = map $_->[1], @$perls;
1357 $perl_labels{$_->[0]} = $_->[1] for @$perls;
1359 my $field_label_width = 6;
1360 # Calculate the width to display for each column.
1361 my $min_width = $OPTS{raw} ? 8 : 6;
1362 my @widths = map { length($_) < $min_width ? $min_width : length($_) }
1365 # Print standard header.
1366 grind_blurb($perls);
1368 my @test_names = sorted_test_names($results, $order, $perls);
1370 # If only a single field is to be displayed, use a more compact
1371 # format with only a single line of output per test.
1373 my $one_field = defined $OPTS{fields} && keys(%{$OPTS{fields}}) == 1;
1376 print "Results for field " . (keys(%{$OPTS{fields}}))[0] . ".\n";
1378 # The first column will now contain test names rather than
1379 # field names; Calculate the max width.
1381 $field_label_width = 0;
1383 $field_label_width = length if length > $field_label_width;
1386 # Print the perl executables header.
1390 print " " x $field_label_width;
1392 printf " %*s", $widths[$_],
1393 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1399 # Dump the results for each test.
1401 for my $test_name (@test_names) {
1402 my $doing_ave = ($test_name eq 'AVERAGE');
1403 my $res1 = $doing_ave ? $averages : $results->{$test_name};
1405 unless ($one_field) {
1406 print "\n$test_name";
1407 print "\n$tests->{$test_name}{desc}" unless $doing_ave;
1410 # Print the perl executables header.
1412 print " " x $field_label_width;
1414 printf " %*s", $widths[$_],
1415 $i ? ('-' x$widths[$_]) : $perl_labels[$_];
1421 for my $field (qw(Ir Dr Dw COND IND
1430 next if $OPTS{fields} and ! exists $OPTS{fields}{$field};
1432 if ($field eq 'N') {
1438 printf "%-*s", $field_label_width, $test_name;
1441 printf "%*s", $field_label_width, $field;
1444 for my $i (0..$#widths) {
1445 my $res2 = $res1->{$perl_labels[$i]};
1446 my $p = $res2->{$field};
1448 printf " %*s", $widths[$i], '-';
1450 elsif ($OPTS{raw}) {
1451 printf " %*.1f", $widths[$i], $p;
1454 printf " %*.2f", $widths[$i], $p * 100;
1464 # grind_print_compact(): like grind_print(), but display a single perl
1465 # in a compact form. Has an additional arg, $which_perl, which specifies
1466 # which perl to display.
1468 # Arguments are of the form:
1469 # $results->{benchmark_name}{perl_name}{field_name} = N
1470 # $averages->{perl_name}{field_name} = M
1471 # $perls = [ [ perl-exe, perl-label ], ... ]
1472 # $tests->{test_name}{desc => ..., ...}
1474 sub grind_print_compact {
1475 my ($results, $averages, $which_perl, $perls, $tests, $order) = @_;
1478 # the width to display for each column.
1479 my $width = $OPTS{raw} ? 7 : 6;
1481 # Print standard header.
1482 grind_blurb($perls);
1484 print "\nResults for $perls->[$which_perl][1]\n\n";
1486 my @test_names = sorted_test_names($results, $order, $perls);
1488 # Dump the results for each test.
1490 my @fields = qw( Ir Dr Dw
1496 if ($OPTS{fields}) {
1497 @fields = grep exists $OPTS{fields}{$_}, @fields;
1500 printf " %*s", $width, $_ for @fields;
1502 printf " %*s", $width, '------' for @fields;
1505 for my $test_name (@test_names) {
1506 my $doing_ave = ($test_name eq 'AVERAGE');
1507 my $res = $doing_ave ? $averages : $results->{$test_name};
1508 $res = $res->{$perls->[$which_perl][1]};
1510 for my $field (@fields) {
1511 my $p = $res->{$field};
1513 printf " %*s", $width, '-';
1515 elsif ($OPTS{raw}) {
1516 printf " %*.1f", $width, $p;
1519 printf " %*.2f", $width, $p * 100;
1524 print " $test_name\n";
1529 # do_selftest(): check that we can parse known cachegrind()
1530 # output formats. If the output of cachegrind changes, add a *new*
1531 # test here; keep the old tests to make sure we continue to parse
1539 ==32350== Cachegrind, a cache and branch-prediction profiler
1540 ==32350== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
1541 ==32350== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
1542 ==32350== Command: perl5211o /tmp/uiS2gjdqe5 1
1544 --32350-- warning: L3 cache found, using its data for the LL simulation.
1546 ==32350== I refs: 1,124,055
1547 ==32350== I1 misses: 5,573
1548 ==32350== LLi misses: 3,338
1549 ==32350== I1 miss rate: 0.49%
1550 ==32350== LLi miss rate: 0.29%
1552 ==32350== D refs: 404,275 (259,191 rd + 145,084 wr)
1553 ==32350== D1 misses: 9,608 ( 6,098 rd + 3,510 wr)
1554 ==32350== LLd misses: 5,794 ( 2,781 rd + 3,013 wr)
1555 ==32350== D1 miss rate: 2.3% ( 2.3% + 2.4% )
1556 ==32350== LLd miss rate: 1.4% ( 1.0% + 2.0% )
1558 ==32350== LL refs: 15,181 ( 11,671 rd + 3,510 wr)
1559 ==32350== LL misses: 9,132 ( 6,119 rd + 3,013 wr)
1560 ==32350== LL miss rate: 0.5% ( 0.4% + 2.0% )
1562 ==32350== Branches: 202,372 (197,050 cond + 5,322 ind)
1563 ==32350== Mispredicts: 19,153 ( 17,742 cond + 1,411 ind)
1564 ==32350== Mispred rate: 9.4% ( 9.0% + 26.5% )
1584 my $t = "$_/test.pl";
1588 plan(@tests / 3 * keys %VALID_FIELDS);
1591 my $desc = shift @tests;
1592 my $output = shift @tests;
1593 my $expected = shift @tests;
1594 my $p = parse_cachegrind($output);
1595 for (sort keys %VALID_FIELDS) {
1596 is($p->{$_}, $expected->{$_}, "$desc, $_");