This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
store data using unique label, not perl-exe, which may be used 2x
[perl5.git] / Porting / bench.pl
CommitLineData
9e7973fa
DM
1#!/usr/bin/perl
2#
3# A tool for analysing the performance of the code snippets found in
4# t/perf/benchmarks or similar
5
6
7=head1 NAME
8
9bench.pl - Compare the performance of perl code snippets across multiple
10perls.
11
12=head1 SYNOPSIS
13
4a1358c2
FC
14 # Basic: run the tests in t/perf/benchmarks against two or
15 # more perls
9e7973fa 16
300b1525 17 bench.pl [options] perl1[=label1] perl2[=label2] ...
9e7973fa
DM
18
19 # Run bench.pl's own built-in sanity tests
20
21 bench.pl --action=selftest
22
23=head1 DESCRIPTION
24
25By default, F<bench.pl> will run code snippets found in
26F<t/perf/benchmarks> (or similar) under cachegrind, in order to calculate
27how many instruction reads, data writes, branches, cache misses, etc. that
28one execution of the snippet uses. It will run them against two or more
29perl executables and show how much each test has gotten better or worse.
30
31It is modelled on the F<perlbench> tool, but since it measures instruction
32reads etc., rather than timings, it is much more precise and reproducible.
e34630bf 33It is also considerably faster, and is capable of running tests in
9e7973fa
DM
34parallel (with C<-j>). Rather than displaying a single relative
35percentage per test/perl combination, it displays values for 13 different
36measurements, such as instruction reads, conditional branch misses etc.
37
38There are options to write the raw data to a file, and to read it back.
39This means that you can view the same run data in different views with
40different selection and sort options.
41
42The optional C<=label> after each perl executable is used in the display
43output.
44
45=head1 OPTIONS
46
47=over 4
48
49=item *
50
51--action=I<foo>
52
53What action to perform. The default is I<grind>, which runs the benchmarks
54using I<cachegrind> as the back end. The only other action at the moment is
55I<selftest>, which runs some basic sanity checks and produces TAP output.
56
57=item *
58
59--average
60
61Only display the overall average, rather than the results for each
62individual test.
63
64=item *
65
66--benchfile=I<foo>
67
68The path of the file which contains the benchmarks (F<t/perf/benchmarks>
69by default).
70
71=item *
72
73--bisect=I<field,minval,maxval>
74
75Run a single test against one perl and exit with a zero status if the
76named field is in the specified range; exit 1 otherwise. It will complain
77if more than one test or perl has been specified. It is intended to be
78called as part of a bisect run, to determine when something changed.
79For example,
80
4a1358c2
FC
81 bench.pl -j 8 --tests=foo --bisect=Ir,100,105 --perlargs=-Ilib \
82 ./miniperl
9e7973fa
DM
83
84might be called from bisect to find when the number of instruction reads
85for test I<foo> falls outside the range 100..105.
86
87=item *
88
df3d7b3a
DM
89--compact=<Iperl>
90
91Display the results for a single perl executable in a compact form.
92Which perl to display is specified in the same manner as C<--norm>.
93
94=item *
95
9e7973fa
DM
96--debug
97
98Enable verbose debugging output.
99
100=item *
101
102--fields=I<a,b,c>
103
104Display only the specified fields; for example,
105
106 --fields=Ir,Ir_m,Ir_mm
107
108If only one field is selected, the output is in more compact form.
109
110=item *
111
112--grindargs=I<foo>
113
114Optional command-line arguments to pass to cachegrind invocations.
115
116=item *
117
118---help
119
120Display basic usage information.
121
122=item *
123
124-j I<N>
125--jobs=I<N>
126
127Run I<N> jobs in parallel (default 1). This determines how many cachegrind
128process will running at a time, and should generally be set to the number
129of CPUs available.
130
131=item *
132
133--norm=I<foo>
134
135Specify which perl column in the output to treat as the 100% norm.
136It may be a column number (0..N-1) or a perl executable name or label.
137It defaults to the leftmost column.
138
139=item *
140
141--perlargs=I<foo>
142
143Optional command-line arguments to pass to each perl that is run as part of
144a cachegrind session. For example, C<--perlargs=-Ilib>.
145
146=item *
147
148--raw
149
150Display raw data counts rather than percentages in the outputs. This
151allows you to see the exact number of intruction reads, branch misses etc.
152for each test/perl combination. It also causes the C<AVERAGE> display
153per field to be calculated based on the average of each tests's count
154rather than average of each percentage. This means that tests with very
155high counts will dominate.
156
157=item *
158
65f0b1c9 159--sort=I<field:perl>
9e7973fa
DM
160
161Order the tests in the output based on the value of I<field> in the
162column I<perl>. The I<perl> value is as per C<--norm>. For example
163
4a1358c2
FC
164 bench.pl --sort=Dw:perl-5.20.0 \
165 perl-5.16.0 perl-5.18.0 perl-5.20.0
9e7973fa
DM
166
167=item *
168
169-r I<file>
170--read=I<file>
171
172Read in saved data from a previous C<--write> run from the specified file.
173
174Requires C<JSON::PP> to be available.
175
176=item *
177
178--tests=I<FOO>
179
180Specify a subset of tests to run (or in the case of C<--read>, to display).
181It may be either a comma-separated list of test names, or a regular
182expression. For example
183
184 --tests=expr::assign::scalar_lex,expr::assign::2list_lex
185 --tests=/^expr::/
186
187=item *
188
189--verbose
190
191Display progress information.
192
193=item *
194
195-w I<file>
196--write=I<file>
197
198Save the raw data to the specified file. It can be read back later with
199C<--read>.
200
201Requires C<JSON::PP> to be available.
202
203=back
204
205=cut
206
207
208
209use 5.010000;
210use warnings;
211use strict;
d54523c4 212use Getopt::Long qw(:config no_auto_abbrev require_order);
9e7973fa
DM
213use IPC::Open2 ();
214use IO::Select;
c2d21e7a 215use IO::File;
9e7973fa
DM
216use POSIX ":sys_wait_h";
217
218# The version of the file format used to save data. We refuse to process
219# the file if the integer component differs.
220
221my $FORMAT_VERSION = 1.0;
222
223# The fields we know about
224
225my %VALID_FIELDS = map { $_ => 1 }
226 qw(Ir Ir_m1 Ir_mm Dr Dr_m1 Dr_mm Dw Dw_m1 Dw_mm COND COND_m IND IND_m);
227
228sub usage {
229 die <<EOF;
d54523c4 230usage: $0 [options] -- perl[=label] ...
9e7973fa
DM
231 --action=foo What action to perform [default: grind].
232 --average Only display average, not individual test results.
233 --benchfile=foo File containing the benchmarks;
234 [default: t/perf/benchmarks].
235 --bisect=f,min,max run a single test against one perl and exit with a
236 zero status if the named field is in the specified
237 range; exit 1 otherwise.
df3d7b3a
DM
238 --compact=perl Display the results of a single perl in compact form.
239 Which perl specified like --norm
9e7973fa
DM
240 --debug Enable verbose debugging output.
241 --fields=a,b,c Display only the specified fields (e.g. Ir,Ir_m,Ir_mm).
242 --grindargs=foo Optional command-line args to pass to cachegrind.
243 --help Display this help.
244 -j|--jobs=N Run N jobs in parallel [default 1].
245 --norm=perl Which perl column to treat as 100%; may be a column
246 number (0..N-1) or a perl executable name or label;
247 [default: 0].
248 --perlargs=foo Optional command-line args to pass to each perl to run.
249 --raw Display raw data counts rather than percentages.
65f0b1c9 250 --sort=field:perl Sort the tests based on the value of 'field' in the
9e7973fa
DM
251 column 'perl'. The perl value is as per --norm.
252 -r|--read=file Read in previously saved data from the specified file.
253 --tests=FOO Select only the specified tests from the benchmarks file;
254 FOO may be either of the form 'foo,bar' or '/regex/';
255 [default: all tests].
256 --verbose Display progress information.
257 -w|--write=file Save the raw data to the specified file.
258
259--action is one of:
260 grind run the code under cachegrind
261 selftest perform a selftest; produce TAP output
262
263The command line ends with one or more specified perl executables,
264which will be searched for in the current \$PATH. Each binary name may
265have an optional =LABEL appended, which will be used rather than the
266executable name in output. E.g.
267
268 perl-5.20.1=PRE-BUGFIX perl-5.20.1-new=POST-BUGFIX
269EOF
270}
271
272my %OPTS = (
273 action => 'grind',
274 average => 0,
275 benchfile => 't/perf/benchmarks',
276 bisect => undef,
df3d7b3a 277 compact => undef,
9e7973fa
DM
278 debug => 0,
279 grindargs => '',
280 fields => undef,
281 jobs => 1,
282 norm => 0,
283 perlargs => '',
284 raw => 0,
285 read => undef,
286 sort => undef,
287 tests => undef,
288 verbose => 0,
289 write => undef,
290);
291
292
293# process command-line args and call top-level action
294
295{
296 GetOptions(
297 'action=s' => \$OPTS{action},
298 'average' => \$OPTS{average},
299 'benchfile=s' => \$OPTS{benchfile},
300 'bisect=s' => \$OPTS{bisect},
df3d7b3a 301 'compact=s' => \$OPTS{compact},
9e7973fa
DM
302 'debug' => \$OPTS{debug},
303 'grindargs=s' => \$OPTS{grindargs},
304 'help' => \$OPTS{help},
305 'fields=s' => \$OPTS{fields},
306 'jobs|j=i' => \$OPTS{jobs},
307 'norm=s' => \$OPTS{norm},
308 'perlargs=s' => \$OPTS{perlargs},
309 'raw' => \$OPTS{raw},
310 'read|r=s' => \$OPTS{read},
311 'sort=s' => \$OPTS{sort},
312 'tests=s' => \$OPTS{tests},
313 'verbose' => \$OPTS{verbose},
314 'write|w=s' => \$OPTS{write},
315 ) or usage;
316
317 usage if $OPTS{help};
318
319
320 if (defined $OPTS{read} and defined $OPTS{write}) {
321 die "Error: can't specify both --read and --write options\n";
322 }
323
324 if (defined $OPTS{read} or defined $OPTS{write}) {
325 # fail early if it's not present
326 require JSON::PP;
327 }
328
329 if (defined $OPTS{fields}) {
330 my @f = split /,/, $OPTS{fields};
331 for (@f) {
332 die "Error: --fields: unknown field '$_'\n"
333 unless $VALID_FIELDS{$_};
334 }
335 my %f = map { $_ => 1 } @f;
336 $OPTS{fields} = \%f;
337 }
338
339 my %valid_actions = qw(grind 1 selftest 1);
340 unless ($valid_actions{$OPTS{action}}) {
341 die "Error: unrecognised action '$OPTS{action}'\n"
342 . "must be one of: " . join(', ', sort keys %valid_actions)."\n";
343 }
344
345 if (defined $OPTS{sort}) {
346 my @s = split /:/, $OPTS{sort};
347 if (@s != 2) {
348 die "Error: --sort argument should be of the form field:perl: "
349 . "'$OPTS{sort}'\n";
350 }
351 my ($field, $perl) = @s;
352 die "Error: --sort: unknown field '$field\n"
353 unless $VALID_FIELDS{$field};
354 # the 'perl' value will be validated later, after we have processed
355 # the perls
356 $OPTS{'sort-field'} = $field;
357 $OPTS{'sort-perl'} = $perl;
358 }
359
360 if ($OPTS{action} eq 'selftest') {
361 if (@ARGV) {
362 die "Error: no perl executables may be specified with --read\n"
363 }
364 }
365 elsif (defined $OPTS{bisect}) {
366 die "Error: exactly one perl executable must be specified for bisect\n"
367 unless @ARGV == 1;
368 die "Error: Can't specify both --bisect and --read\n"
369 if defined $OPTS{read};
370 die "Error: Can't specify both --bisect and --write\n"
371 if defined $OPTS{write};
372 }
373 elsif (defined $OPTS{read}) {
374 if (@ARGV) {
375 die "Error: no perl executables may be specified with --read\n"
376 }
377 }
378 elsif ($OPTS{raw}) {
379 unless (@ARGV) {
380 die "Error: at least one perl executable must be specified\n";
381 }
382 }
383 else {
384 unless (@ARGV >= 2) {
385 die "Error: at least two perl executables must be specified\n";
386 }
387 }
388
389 if ($OPTS{action} eq 'grind') {
390 do_grind(\@ARGV);
391 }
392 elsif ($OPTS{action} eq 'selftest') {
393 do_selftest();
394 }
395}
396exit 0;
397
398
399# Given a hash ref keyed by test names, filter it by deleting unwanted
400# tests, based on $OPTS{tests}.
401
402sub filter_tests {
403 my ($tests) = @_;
404
405 my $opt = $OPTS{tests};
406 return unless defined $opt;
407
408 my @tests;
409
410 if ($opt =~ m{^/}) {
411 $opt =~ s{^/(.+)/$}{$1}
412 or die "Error: --tests regex must be of the form /.../\n";
413 for (keys %$tests) {
414 delete $tests->{$_} unless /$opt/;
415 }
416 }
417 else {
418 my %t;
419 for (split /,/, $opt) {
24293753
JC
420 die "Error: no such test found: '$_'\n"
421 . ($OPTS{verbose} ? " have: @{[ sort keys %$tests ]}\n" : "")
422 unless exists $tests->{$_};
9e7973fa
DM
423 $t{$_} = 1;
424 }
425 for (keys %$tests) {
426 delete $tests->{$_} unless exists $t{$_};
427 }
428 }
429}
430
431
432# Read in the test file, and filter out any tests excluded by $OPTS{tests}
957d8930
DM
433# return a hash ref { testname => { test }, ... }
434# and an array ref of the original test names order,
9e7973fa
DM
435
436sub read_tests_file {
437 my ($file) = @_;
438
439 my $ta = do $file;
440 unless ($ta) {
441 die "Error: can't parse '$file': $@\n" if $@;
442 die "Error: can't read '$file': $!\n";
443 }
444
957d8930
DM
445 my @orig_order;
446 for (my $i=0; $i < @$ta; $i += 2) {
447 push @orig_order, $ta->[$i];
448 }
449
9e7973fa
DM
450 my $t = { @$ta };
451 filter_tests($t);
957d8930 452 return $t, \@orig_order;
9e7973fa
DM
453}
454
455
456# Process the perl/column argument of options like --norm and --sort.
457# Return the index of the matching perl.
458
459sub select_a_perl {
460 my ($perl, $perls, $who) = @_;
461
462 if ($perl =~ /^[0-9]$/) {
463 die "Error: $who value $perl outside range 0.." . $#$perls . "\n"
464 unless $perl < @$perls;
465 return $perl;
466 }
467 else {
468 my @perl = grep $perls->[$_][0] eq $perl
469 || $perls->[$_][1] eq $perl,
470 0..$#$perls;
471 die "Error: $who: unrecognised perl '$perl'\n"
472 unless @perl;
473 die "Error: $who: ambiguous perl '$perl'\n"
474 if @perl > 1;
475 return $perl[0];
476 }
477}
478
479
480# Validate the list of perl=label on the command line.
9e7973fa
DM
481# Return a list of [ exe, label ] pairs.
482
d54523c4 483sub process_puts {
9e7973fa 484 my @results;
955a736c 485 my %seen;
d54523c4
JC
486 my @putargs; # collect not-perls into args per PUT
487
488 for my $p (reverse @_) {
489 push @putargs, $p and next if $p =~ /^-/; # not-perl, dont send to qx//
490
9e7973fa
DM
491 my ($perl, $label) = split /=/, $p, 2;
492 $label //= $perl;
955a736c
JC
493 die "$label cannot be used on 2 different PUTs\n" if $seen{$label}++;
494
9e7973fa 495 my $r = qx($perl -e 'print qq(ok\n)' 2>&1);
d54523c4
JC
496 if ($r eq "ok\n") {
497 push @results, [ $perl, $label, reverse @putargs ];
498 @putargs = ();
499 } else {
500 push @putargs, $p; # not-perl
501 }
9e7973fa 502 }
d54523c4 503 return reverse @results;
9e7973fa
DM
504}
505
506
8fbd1c2c 507
9e7973fa
DM
508# Return a string containing perl test code wrapped in a loop
509# that runs $ARGV[0] times
510
511sub make_perl_prog {
512 my ($test, $desc, $setup, $code) = @_;
513
514 return <<EOF;
515# $desc
516package $test;
517BEGIN { srand(0) }
518$setup;
519for my \$__loop__ (1..\$ARGV[0]) {
520 $code;
521}
522EOF
523}
524
525
526# Parse the output from cachegrind. Return a hash ref.
527# See do_selftest() for examples of the output format.
528
529sub parse_cachegrind {
530 my ($output, $id, $perl) = @_;
531
532 my %res;
533
534 my @lines = split /\n/, $output;
535 for (@lines) {
536 unless (s/(==\d+==)|(--\d+--) //) {
537 die "Error: while executing $id:\n"
538 . "unexpected code or cachegrind output:\n$_\n";
539 }
540 if (/I refs:\s+([\d,]+)/) {
541 $res{Ir} = $1;
542 }
543 elsif (/I1 misses:\s+([\d,]+)/) {
544 $res{Ir_m1} = $1;
545 }
546 elsif (/LLi misses:\s+([\d,]+)/) {
547 $res{Ir_mm} = $1;
548 }
549 elsif (/D refs:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
550 @res{qw(Dr Dw)} = ($1,$2);
551 }
552 elsif (/D1 misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
553 @res{qw(Dr_m1 Dw_m1)} = ($1,$2);
554 }
555 elsif (/LLd misses:\s+.*?([\d,]+) rd .*?([\d,]+) wr/) {
556 @res{qw(Dr_mm Dw_mm)} = ($1,$2);
557 }
558 elsif (/Branches:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
559 @res{qw(COND IND)} = ($1,$2);
560 }
561 elsif (/Mispredicts:\s+.*?([\d,]+) cond .*?([\d,]+) ind/) {
562 @res{qw(COND_m IND_m)} = ($1,$2);
563 }
564 }
565
566 for my $field (keys %VALID_FIELDS) {
567 die "Error: can't parse '$field' field from cachegrind output:\n$output"
568 unless exists $res{$field};
569 $res{$field} =~ s/,//g;
570 }
571
572 return \%res;
573}
574
575
576# Handle the 'grind' action
577
578sub do_grind {
579 my ($perl_args) = @_; # the residue of @ARGV after option processing
580
957d8930 581 my ($loop_counts, $perls, $results, $tests, $order);
9e7973fa
DM
582 my ($bisect_field, $bisect_min, $bisect_max);
583
584 if (defined $OPTS{bisect}) {
585 ($bisect_field, $bisect_min, $bisect_max) = split /,/, $OPTS{bisect}, 3;
586 die "Error: --bisect option must be of form 'field,integer,integer'\n"
587 unless
588 defined $bisect_max
589 and $bisect_min =~ /^[0-9]+$/
590 and $bisect_max =~ /^[0-9]+$/;
591
592 die "Error: unrecognised field '$bisect_field' in --bisect option\n"
593 unless $VALID_FIELDS{$bisect_field};
594
595 die "Error: --bisect min ($bisect_min) must be <= max ($bisect_max)\n"
596 if $bisect_min > $bisect_max;
597 }
598
599 if (defined $OPTS{read}) {
600 open my $in, '<:encoding(UTF-8)', $OPTS{read}
601 or die " Error: can't open $OPTS{read} for reading: $!\n";
602 my $data = do { local $/; <$in> };
603 close $in;
604
605 my $hash = JSON::PP::decode_json($data);
606 if (int($FORMAT_VERSION) < int($hash->{version})) {
607 die "Error: unsupported version $hash->{version} in file"
608 . "'$OPTS{read}' (too new)\n";
609 }
957d8930
DM
610 ($loop_counts, $perls, $results, $tests, $order) =
611 @$hash{qw(loop_counts perls results tests order)};
9e7973fa
DM
612
613 filter_tests($results);
614 filter_tests($tests);
957d8930
DM
615
616 if (!$order) {
617 $order = [ sort keys %$tests ];
618 }
9e7973fa
DM
619 }
620 else {
621 # How many times to execute the loop for the two trials. The lower
622 # value is intended to do the loop enough times that branch
623 # prediction has taken hold; the higher loop allows us to see the
624 # branch misses after that
625 $loop_counts = [10, 20];
626
957d8930 627 ($tests, $order) = read_tests_file($OPTS{benchfile});
9e7973fa
DM
628 die "Error: only a single test may be specified with --bisect\n"
629 if defined $OPTS{bisect} and keys %$tests != 1;
630
d54523c4 631 $perls = [ process_puts(@$perl_args) ];
8fbd1c2c
DM
632
633
957d8930 634 $results = grind_run($tests, $order, $perls, $loop_counts);
9e7973fa
DM
635 }
636
8fbd1c2c
DM
637 # now that we have a list of perls, use it to process the
638 # 'perl' component of the --norm and --sort args
639
640 $OPTS{norm} = select_a_perl($OPTS{norm}, $perls, "--norm");
641 if (defined $OPTS{'sort-perl'}) {
642 $OPTS{'sort-perl'} =
643 select_a_perl($OPTS{'sort-perl'}, $perls, "--sort");
644 }
645
df3d7b3a
DM
646 if (defined $OPTS{'compact'}) {
647 $OPTS{'compact'} =
648 select_a_perl($OPTS{'compact'}, $perls, "--compact");
649 }
9e7973fa
DM
650 if (defined $OPTS{write}) {
651 my $json = JSON::PP::encode_json({
652 version => $FORMAT_VERSION,
653 loop_counts => $loop_counts,
654 perls => $perls,
655 results => $results,
656 tests => $tests,
957d8930 657 order => $order,
9e7973fa
DM
658 });
659
660 open my $out, '>:encoding(UTF-8)', $OPTS{write}
661 or die " Error: can't open $OPTS{write} for writing: $!\n";
662 print $out $json or die "Error: writing to file '$OPTS{write}': $!\n";
663 close $out or die "Error: closing file '$OPTS{write}': $!\n";
664 }
665 else {
666 my ($processed, $averages) =
667 grind_process($results, $perls, $loop_counts);
668
669 if (defined $OPTS{bisect}) {
670 my @r = values %$results;
671 die "Panic: expected exactly one test result in bisect\n"
672 if @r != 1;
673 @r = values %{$r[0]};
674 die "Panic: expected exactly one perl result in bisect\n"
675 if @r != 1;
676 my $c = $r[0]{$bisect_field};
677 die "Panic: no result in bisect for field '$bisect_field'\n"
678 unless defined $c;
679 exit 0 if $bisect_min <= $c and $c <= $bisect_max;
680 exit 1;
681 }
df3d7b3a
DM
682 elsif (defined $OPTS{compact}) {
683 grind_print_compact($processed, $averages, $OPTS{compact},
684 $perls, $tests, $order);
685 }
9e7973fa 686 else {
957d8930 687 grind_print($processed, $averages, $perls, $tests, $order);
9e7973fa
DM
688 }
689 }
690}
691
692
693# Run cachegrind for every test/perl combo.
694# It may run several processes in parallel when -j is specified.
695# Return a hash ref suitable for input to grind_process()
696
697sub grind_run {
957d8930 698 my ($tests, $order, $perls, $counts) = @_;
9e7973fa
DM
699
700 # Build a list of all the jobs to run
701
702 my @jobs;
703
957d8930 704 for my $test (grep $tests->{$_}, @$order) {
9e7973fa
DM
705
706 # Create two test progs: one with an empty loop and one with code.
707 # Note that the empty loop is actually '{1;}' rather than '{}';
708 # this causes the loop to have a single nextstate rather than a
709 # stub op, so more closely matches the active loop; e.g.:
710 # {1;} => nextstate; unstack
711 # {$x=1;} => nextstate; const; gvsv; sassign; unstack
712 my @prog = (
713 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup)}, '1'),
714 make_perl_prog($test, @{$tests->{$test}}{qw(desc setup code)}),
715 );
716
717 for my $p (@$perls) {
a9eceb2d 718 my ($perl, $label, @putargs) = @$p;
9e7973fa
DM
719
720 # Run both the empty loop and the active loop
721 # $counts->[0] and $counts->[1] times.
722
723 for my $i (0,1) {
724 for my $j (0,1) {
725 my $cmd = "PERL_HASH_SEED=0 "
726 . "valgrind --tool=cachegrind --branch-sim=yes "
727 . "--cachegrind-out-file=/dev/null "
728 . "$OPTS{grindargs} "
a9eceb2d 729 . "$perl $OPTS{perlargs} @putargs - $counts->[$j] 2>&1";
9e7973fa 730 # for debugging and error messages
c385646f 731 my $id = "$test/$label "
9e7973fa
DM
732 . ($i ? "active" : "empty") . "/"
733 . ($j ? "long" : "short") . " loop";
734
735 push @jobs, {
736 test => $test,
737 perl => $perl,
738 plabel => $label,
739 cmd => $cmd,
740 prog => $prog[$i],
741 active => $i,
742 loopix => $j,
743 id => $id,
744 };
745 }
746 }
747 }
748 }
749
750 # Execute each cachegrind and store the results in %results.
751
752 local $SIG{PIPE} = 'IGNORE';
753
754 my $max_jobs = $OPTS{jobs};
755 my $running = 0; # count of executing jobs
756 my %pids; # map pids to jobs
757 my %fds; # map fds to jobs
758 my %results;
759 my $select = IO::Select->new();
760
761 while (@jobs or $running) {
762
763 if ($OPTS{debug}) {
764 printf "Main loop: pending=%d running=%d\n",
765 scalar(@jobs), $running;
766 }
767
768 # Start new jobs
769
770 while (@jobs && $running < $max_jobs) {
771 my $job = shift @jobs;
772 my ($id, $cmd) =@$job{qw(id cmd)};
773
774 my ($in, $out, $pid);
775 warn "Starting $id\n" if $OPTS{verbose};
776 eval { $pid = IPC::Open2::open2($out, $in, $cmd); 1; }
777 or die "Error: while starting cachegrind subprocess"
778 ." for $id:\n$@";
779 $running++;
780 $pids{$pid} = $job;
781 $fds{"$out"} = $job;
782 $job->{out_fd} = $out;
783 $job->{output} = '';
784 $job->{pid} = $pid;
785
786 $out->blocking(0);
787 $select->add($out);
788
789 if ($OPTS{debug}) {
790 print "Started pid $pid for $id\n";
791 }
792
793 # Note:
794 # In principle we should write to $in in the main select loop,
795 # since it may block. In reality,
796 # a) the code we write to the perl process's stdin is likely
797 # to be less than the OS's pipe buffer size;
798 # b) by the time the perl process has read in all its stdin,
799 # the only output it should have generated is a few lines
800 # of cachegrind output preamble.
801 # If these assumptions change, then perform the following print
802 # in the select loop instead.
803
804 print $in $job->{prog};
805 close $in;
806 }
807
808 # Get output of running jobs
809
810 if ($OPTS{debug}) {
811 printf "Select: waiting on (%s)\n",
812 join ', ', sort { $a <=> $b } map $fds{$_}{pid},
813 $select->handles;
814 }
815
816 my @ready = $select->can_read;
817
818 if ($OPTS{debug}) {
819 printf "Select: pids (%s) ready\n",
820 join ', ', sort { $a <=> $b } map $fds{$_}{pid}, @ready;
821 }
822
823 unless (@ready) {
824 die "Panic: select returned no file handles\n";
825 }
826
827 for my $fd (@ready) {
828 my $j = $fds{"$fd"};
829 my $r = sysread $fd, $j->{output}, 8192, length($j->{output});
830 unless (defined $r) {
831 die "Panic: Read from process running $j->{id} gave:\n$!";
832 }
833 next if $r;
834
835 # EOF
836
837 if ($OPTS{debug}) {
838 print "Got eof for pid $fds{$fd}{pid} ($j->{id})\n";
839 }
840
841 $select->remove($j->{out_fd});
842 close($j->{out_fd})
843 or die "Panic: closing output fh on $j->{id} gave:\n$!\n";
844 $running--;
845 delete $fds{"$j->{out_fd}"};
846 my $output = $j->{output};
847
848 if ($OPTS{debug}) {
849 my $p = $j->{prog};
850 $p =~ s/^/ : /mg;
851 my $o = $output;
852 $o =~ s/^/ : /mg;
853
854 print "\n$j->{id}/\nCommand: $j->{cmd}\n"
855 . "Input:\n$p"
856 . "Output\n$o";
857 }
858
c385646f 859 $results{$j->{test}}{$j->{plabel}}[$j->{active}][$j->{loopix}]
9e7973fa
DM
860 = parse_cachegrind($output, $j->{id}, $j->{perl});
861 }
862
863 # Reap finished jobs
864
865 while (1) {
866 my $kid = waitpid(-1, WNOHANG);
867 my $ret = $?;
868 last if $kid <= 0;
869
870 unless (exists $pids{$kid}) {
871 die "Panic: reaped unexpected child $kid";
872 }
873 my $j = $pids{$kid};
874 if ($ret) {
875 die sprintf("Error: $j->{id} gave return status 0x%04x\n", $ret)
876 . "with the following output\n:$j->{output}\n";
877 }
878 delete $pids{$kid};
879 }
880 }
881
882 return \%results;
883}
884
885
886
887
888# grind_process(): process the data that has been extracted from
889# cachgegrind's output.
890#
891# $res is of the form ->{benchmark_name}{perl_name}[active][count]{field_name},
892# where active is 0 or 1 indicating an empty or active loop,
893# count is 0 or 1 indicating a short or long loop. E.g.
894#
895# $res->{'expr::assign::scalar_lex'}{perl-5.21.1}[0][10]{Dw_mm}
896#
897# The $res data structure is modified in-place by this sub.
898#
899# $perls is [ [ perl-exe, perl-label], .... ].
900#
901# $counts is [ N, M ] indicating the counts for the short and long loops.
902#
903#
904# return \%output, \%averages, where
905#
906# $output{benchmark_name}{perl_name}{field_name} = N
907# $averages{perl_name}{field_name} = M
908#
909# where N is the raw count ($OPTS{raw}), or count_perl0/count_perlI otherwise;
910# M is the average raw count over all tests ($OPTS{raw}), or
911# 1/(sum(count_perlI/count_perl0)/num_tests) otherwise.
912
913sub grind_process {
914 my ($res, $perls, $counts) = @_;
915
916 # Process the four results for each test/perf combo:
917 # Convert
918 # $res->{benchmark_name}{perl_name}[active][count]{field_name} = n
919 # to
920 # $res->{benchmark_name}{perl_name}{field_name} = averaged_n
921 #
922 # $r[0][1] - $r[0][0] is the time to do ($counts->[1]-$counts->[0])
923 # empty loops, eliminating startup time
924 # $r[1][1] - $r[1][0] is the time to do ($counts->[1]-$counts->[0])
925 # active loops, eliminating startup time
926 # (the two startup times may be different because different code
927 # is being compiled); the difference of the two results above
928 # divided by the count difference is the time to execute the
929 # active code once, eliminating both startup and loop overhead.
930
931 for my $tests (values %$res) {
932 for my $r (values %$tests) {
933 my $r2;
934 for (keys %{$r->[0][0]}) {
935 my $n = ( ($r->[1][1]{$_} - $r->[1][0]{$_})
936 - ($r->[0][1]{$_} - $r->[0][0]{$_})
937 ) / ($counts->[1] - $counts->[0]);
938 $r2->{$_} = $n;
939 }
940 $r = $r2;
941 }
942 }
943
944 my %totals;
945 my %counts;
946 my %data;
947
948 my $perl_norm = $perls->[$OPTS{norm}][0]; # the name of the reference perl
949
950 for my $test_name (keys %$res) {
951 my $res1 = $res->{$test_name};
952 my $res2_norm = $res1->{$perl_norm};
953 for my $perl (keys %$res1) {
954 my $res2 = $res1->{$perl};
955 for my $field (keys %$res2) {
956 my ($p, $q) = ($res2_norm->{$field}, $res2->{$field});
957
958 if ($OPTS{raw}) {
959 # Avoid annoying '-0.0' displays. Ideally this number
960 # should never be negative, but fluctuations in
961 # startup etc can theoretically make this happen
962 $q = 0 if ($q <= 0 && $q > -0.1);
963 $totals{$perl}{$field} += $q;
964 $counts{$perl}{$field}++;
965 $data{$test_name}{$perl}{$field} = $q;
966 next;
967 }
968
969 # $p and $q are notionally integer counts, but
970 # due to variations in startup etc, it's possible for a
971 # count which is supposedly zero to be calculated as a
972 # small positive or negative value.
973 # In this case, set it to zero. Further below we
974 # special-case zeros to avoid division by zero errors etc.
975
976 $p = 0.0 if $p < 0.01;
977 $q = 0.0 if $q < 0.01;
978
979 if ($p == 0.0 && $q == 0.0) {
980 # Both perls gave a count of zero, so no change:
981 # treat as 100%
982 $totals{$perl}{$field} += 1;
983 $counts{$perl}{$field}++;
984 $data{$test_name}{$perl}{$field} = 1;
985 }
986 elsif ($p == 0.0 || $q == 0.0) {
987 # If either count is zero, there were too few events
988 # to give a meaningful ratio (and we will end up with
989 # division by zero if we try). Mark the result undef,
990 # indicating that it shouldn't be displayed; and skip
991 # adding to the average
992 $data{$test_name}{$perl}{$field} = undef;
993 }
994 else {
995 # For averages, we record q/p rather than p/q.
996 # Consider a test where perl_norm took 1000 cycles
997 # and perlN took 800 cycles. For the individual
998 # results we display p/q, or 1.25; i.e. a quarter
999 # quicker. For the averages, we instead sum all
1000 # the 0.8's, which gives the total cycles required to
1001 # execute all tests, with all tests given equal
1002 # weight. Later we reciprocate the final result,
1003 # i.e. 1/(sum(qi/pi)/n)
1004
1005 $totals{$perl}{$field} += $q/$p;
1006 $counts{$perl}{$field}++;
1007 $data{$test_name}{$perl}{$field} = $p/$q;
1008 }
1009 }
1010 }
1011 }
1012
1013 # Calculate averages based on %totals and %counts accumulated earlier.
1014
1015 my %averages;
1016 for my $perl (keys %totals) {
1017 my $t = $totals{$perl};
1018 for my $field (keys %$t) {
1019 $averages{$perl}{$field} = $OPTS{raw}
1020 ? $t->{$field} / $counts{$perl}{$field}
1021 # reciprocal - see comments above
1022 : $counts{$perl}{$field} / $t->{$field};
1023 }
1024 }
1025
1026 return \%data, \%averages;
1027}
1028
1029
9e7973fa 1030
df3d7b3a 1031# print a standard blurb at the start of the grind display
9e7973fa 1032
df3d7b3a
DM
1033sub grind_blurb {
1034 my ($perls) = @_;
9e7973fa
DM
1035
1036 print <<EOF;
1037Key:
1038 Ir Instruction read
1039 Dr Data read
1040 Dw Data write
1041 COND conditional branches
1042 IND indirect branches
1043 _m branch predict miss
1044 _m1 level 1 cache miss
1045 _mm last cache (e.g. L3) miss
1046 - indeterminate percentage (e.g. 1/0)
1047
1048EOF
1049
1050 if ($OPTS{raw}) {
1051 print "The numbers represent raw counts per loop iteration.\n";
1052 }
1053 else {
1054 print <<EOF;
1055The numbers represent relative counts per loop iteration, compared to
df3d7b3a 1056$perls->[$OPTS{norm}][1] at 100.0%.
9e7973fa
DM
1057Higher is better: for example, using half as many instructions gives 200%,
1058while using twice as many gives 50%.
1059EOF
1060 }
df3d7b3a
DM
1061}
1062
1063
1064# return a sorted list of the test names, plus 'AVERAGE'
9e7973fa 1065
df3d7b3a
DM
1066sub sorted_test_names {
1067 my ($results, $order, $perls) = @_;
9e7973fa 1068
df3d7b3a 1069 my @names;
9e7973fa
DM
1070 unless ($OPTS{average}) {
1071 if (defined $OPTS{'sort-field'}) {
1072 my ($field, $perlix) = @OPTS{'sort-field', 'sort-perl'};
1073 my $perl = $perls->[$perlix][0];
df3d7b3a 1074 @names = sort
9e7973fa
DM
1075 {
1076 $results->{$a}{$perl}{$field}
1077 <=> $results->{$b}{$perl}{$field}
1078 }
1079 keys %$results;
1080 }
1081 else {
df3d7b3a 1082 @names = grep $results->{$_}, @$order;
9e7973fa
DM
1083 }
1084 }
1085
1086 # No point in displaying average for only one test.
df3d7b3a
DM
1087 push @names, 'AVERAGE' unless @names == 1;
1088 @names;
1089}
1090
1091
1092# grind_print(): display the tabulated results of all the cachegrinds.
1093#
1094# Arguments are of the form:
1095# $results->{benchmark_name}{perl_name}{field_name} = N
1096# $averages->{perl_name}{field_name} = M
1097# $perls = [ [ perl-exe, perl-label ], ... ]
1098# $tests->{test_name}{desc => ..., ...}
1099
1100sub grind_print {
1101 my ($results, $averages, $perls, $tests, $order) = @_;
1102
1103 my @perl_names = map $_->[0], @$perls;
1104 my %perl_labels;
1105 $perl_labels{$_->[0]} = $_->[1] for @$perls;
1106
1107 my $field_label_width = 6;
1108 # Calculate the width to display for each column.
1109 my $min_width = $OPTS{raw} ? 8 : 6;
1110 my @widths = map { length($_) < $min_width ? $min_width : length($_) }
1111 @perl_labels{@perl_names};
1112
1113 # Print standard header.
1114 grind_blurb($perls);
1115
1116 my @test_names = sorted_test_names($results, $order, $perls);
9e7973fa
DM
1117
1118 # If only a single field is to be displayed, use a more compact
1119 # format with only a single line of output per test.
1120
1121 my $one_field = defined $OPTS{fields} && keys(%{$OPTS{fields}}) == 1;
1122
1123 if ($one_field) {
91cde97c 1124 print "Results for field " . (keys(%{$OPTS{fields}}))[0] . ".\n";
9e7973fa
DM
1125
1126 # The first column will now contain test names rather than
1127 # field names; Calculate the max width.
1128
1129 $field_label_width = 0;
1130 for (@test_names) {
1131 $field_label_width = length if length > $field_label_width;
1132 }
1133
1134 # Print the perl executables header.
1135
1136 print "\n";
1137 for my $i (0,1) {
1138 print " " x $field_label_width;
1139 for (0..$#widths) {
1140 printf " %*s", $widths[$_],
1141 $i ? ('-' x$widths[$_]) : $perl_labels{$perl_names[$_]};
1142 }
1143 print "\n";
1144 }
1145 }
1146
1147 # Dump the results for each test.
1148
1149 for my $test_name (@test_names) {
1150 my $doing_ave = ($test_name eq 'AVERAGE');
1151 my $res1 = $doing_ave ? $averages : $results->{$test_name};
1152
1153 unless ($one_field) {
1154 print "\n$test_name";
1155 print "\n$tests->{$test_name}{desc}" unless $doing_ave;
1156 print "\n\n";
1157
1158 # Print the perl executables header.
1159 for my $i (0,1) {
1160 print " " x $field_label_width;
1161 for (0..$#widths) {
1162 printf " %*s", $widths[$_],
1163 $i ? ('-' x$widths[$_]) : $perl_labels{$perl_names[$_]};
1164 }
1165 print "\n";
1166 }
1167 }
1168
1169 for my $field (qw(Ir Dr Dw COND IND
1170 N
1171 COND_m IND_m
1172 N
1173 Ir_m1 Dr_m1 Dw_m1
1174 N
1175 Ir_mm Dr_mm Dw_mm
1176 ))
1177 {
1178 next if $OPTS{fields} and ! exists $OPTS{fields}{$field};
1179
1180 if ($field eq 'N') {
1181 print "\n";
1182 next;
1183 }
1184
91cde97c
DM
1185 if ($one_field) {
1186 printf "%-*s", $field_label_width, $test_name;
1187 }
1188 else {
1189 printf "%*s", $field_label_width, $field;
1190 }
9e7973fa
DM
1191
1192 for my $i (0..$#widths) {
1193 my $res2 = $res1->{$perl_names[$i]};
1194 my $p = $res2->{$field};
1195 if (!defined $p) {
1196 printf " %*s", $widths[$i], '-';
1197 }
1198 elsif ($OPTS{raw}) {
1199 printf " %*.1f", $widths[$i], $p;
1200 }
1201 else {
1202 printf " %*.2f", $widths[$i], $p * 100;
1203 }
1204 }
1205 print "\n";
1206 }
1207 }
1208}
1209
1210
df3d7b3a
DM
1211
1212# grind_print_compact(): like grind_print(), but display a single perl
1213# in a compact form. Has an additional arg, $which_perl, which specifies
1214# which perl to display.
1215#
1216# Arguments are of the form:
1217# $results->{benchmark_name}{perl_name}{field_name} = N
1218# $averages->{perl_name}{field_name} = M
1219# $perls = [ [ perl-exe, perl-label ], ... ]
1220# $tests->{test_name}{desc => ..., ...}
1221
1222sub grind_print_compact {
1223 my ($results, $averages, $which_perl, $perls, $tests, $order) = @_;
1224
1225
1226 # the width to display for each column.
1227 my $width = $OPTS{raw} ? 7 : 6;
1228
1229 # Print standard header.
1230 grind_blurb($perls);
1231
1232 print "\nResults for $perls->[$which_perl][1]\n\n";
1233
1234 my @test_names = sorted_test_names($results, $order, $perls);
1235
1236 # Dump the results for each test.
1237
1238 my @fields = qw( Ir Dr Dw
1239 COND IND
1240 COND_m IND_m
1241 Ir_m1 Dr_m1 Dw_m1
1242 Ir_mm Dr_mm Dw_mm
1243 );
1244 if ($OPTS{fields}) {
1245 @fields = grep exists $OPTS{fields}{$_}, @fields;
1246 }
1247
1248 printf " %*s", $width, $_ for @fields;
1249 print "\n";
1250 printf " %*s", $width, '------' for @fields;
1251 print "\n";
1252
1253 for my $test_name (@test_names) {
1254 my $doing_ave = ($test_name eq 'AVERAGE');
1255 my $res = $doing_ave ? $averages : $results->{$test_name};
1256 $res = $res->{$perls->[$which_perl][0]};
1257
1258 for my $field (@fields) {
1259 my $p = $res->{$field};
1260 if (!defined $p) {
1261 printf " %*s", $width, '-';
1262 }
1263 elsif ($OPTS{raw}) {
1264 printf " %*.1f", $width, $p;
1265 }
1266 else {
1267 printf " %*.2f", $width, $p * 100;
1268 }
1269
1270 }
1271
1272 print " $test_name\n";
1273 }
1274}
1275
1276
9e7973fa
DM
1277# do_selftest(): check that we can parse known cachegrind()
1278# output formats. If the output of cachegrind changes, add a *new*
1279# test here; keep the old tests to make sure we continue to parse
1280# old cachegrinds
1281
1282sub do_selftest {
1283
1284 my @tests = (
1285 'standard',
1286 <<'EOF',
1287==32350== Cachegrind, a cache and branch-prediction profiler
1288==32350== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
1289==32350== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
1290==32350== Command: perl5211o /tmp/uiS2gjdqe5 1
1291==32350==
1292--32350-- warning: L3 cache found, using its data for the LL simulation.
1293==32350==
1294==32350== I refs: 1,124,055
1295==32350== I1 misses: 5,573
1296==32350== LLi misses: 3,338
1297==32350== I1 miss rate: 0.49%
1298==32350== LLi miss rate: 0.29%
1299==32350==
1300==32350== D refs: 404,275 (259,191 rd + 145,084 wr)
1301==32350== D1 misses: 9,608 ( 6,098 rd + 3,510 wr)
1302==32350== LLd misses: 5,794 ( 2,781 rd + 3,013 wr)
1303==32350== D1 miss rate: 2.3% ( 2.3% + 2.4% )
1304==32350== LLd miss rate: 1.4% ( 1.0% + 2.0% )
1305==32350==
1306==32350== LL refs: 15,181 ( 11,671 rd + 3,510 wr)
1307==32350== LL misses: 9,132 ( 6,119 rd + 3,013 wr)
1308==32350== LL miss rate: 0.5% ( 0.4% + 2.0% )
1309==32350==
1310==32350== Branches: 202,372 (197,050 cond + 5,322 ind)
1311==32350== Mispredicts: 19,153 ( 17,742 cond + 1,411 ind)
1312==32350== Mispred rate: 9.4% ( 9.0% + 26.5% )
1313EOF
1314 {
1315 COND => 197050,
1316 COND_m => 17742,
1317 Dr => 259191,
1318 Dr_m1 => 6098,
1319 Dr_mm => 2781,
1320 Dw => 145084,
1321 Dw_m1 => 3510,
1322 Dw_mm => 3013,
1323 IND => 5322,
1324 IND_m => 1411,
1325 Ir => 1124055,
1326 Ir_m1 => 5573,
1327 Ir_mm => 3338,
1328 },
1329 );
1330
1331 for ('t', '.') {
1332 last if require "$_/test.pl";
1333 }
1334 plan(@tests / 3 * keys %VALID_FIELDS);
1335
1336 while (@tests) {
1337 my $desc = shift @tests;
1338 my $output = shift @tests;
1339 my $expected = shift @tests;
1340 my $p = parse_cachegrind($output);
1341 for (sort keys %VALID_FIELDS) {
1342 is($p->{$_}, $expected->{$_}, "$desc, $_");
1343 }
1344 }
1345}