| 1 | =head1 NAME |
| 2 | |
| 3 | perlperf - Perl Performance and Optimization Techniques |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | This is an introduction to the use of performance and optimization techniques |
| 8 | which can be used with particular reference to perl programs. While many perl |
| 9 | developers have come from other languages, and can use their prior knowledge |
| 10 | where appropriate, there are many other people who might benefit from a few |
| 11 | perl specific pointers. If you want the condensed version, perhaps the best |
| 12 | advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said: |
| 13 | |
| 14 | "Do Not Engage in Useless Activity" |
| 15 | |
| 16 | in 1645. |
| 17 | |
| 18 | =head1 OVERVIEW |
| 19 | |
| 20 | Perhaps the most common mistake programmers make is to attempt to optimize |
| 21 | their code before a program actually does anything useful - this is a bad idea. |
| 22 | There's no point in having an extremely fast program that doesn't work. The |
| 23 | first job is to get a program to I<correctly> do something B<useful>, (not to |
| 24 | mention ensuring the test suite is fully functional), and only then to consider |
| 25 | optimizing it. Having decided to optimize existing working code, there are |
| 26 | several simple but essential steps to consider which are intrinsic to any |
| 27 | optimization process. |
| 28 | |
| 29 | =head2 ONE STEP SIDEWAYS |
| 30 | |
| 31 | Firstly, you need to establish a baseline time for the existing code, which |
| 32 | timing needs to be reliable and repeatable. You'll probably want to use the |
| 33 | C<Benchmark> or C<Devel::NYTProf> modules, or something similar, for this step, |
| 34 | or perhaps the Unix system C<time> utility, whichever is appropriate. See the |
| 35 | base of this document for a longer list of benchmarking and profiling modules, |
| 36 | and recommended further reading. |
| 37 | |
| 38 | =head2 ONE STEP FORWARD |
| 39 | |
| 40 | Next, having examined the program for I<hot spots>, (places where the code |
| 41 | seems to run slowly), change the code with the intention of making it run |
| 42 | faster. Using version control software, like C<subversion>, will ensure no |
| 43 | changes are irreversible. It's too easy to fiddle here and fiddle there - |
| 44 | don't change too much at any one time or you might not discover which piece of |
| 45 | code B<really> was the slow bit. |
| 46 | |
| 47 | =head2 ANOTHER STEP SIDEWAYS |
| 48 | |
| 49 | It's not enough to say: "that will make it run faster", you have to check it. |
| 50 | Rerun the code under control of the benchmarking or profiling modules, from the |
| 51 | first step above, and check that the new code executed the B<same task> in |
| 52 | I<less time>. Save your work and repeat... |
| 53 | |
| 54 | =head1 GENERAL GUIDELINES |
| 55 | |
| 56 | The critical thing when considering performance is to remember there is no such |
| 57 | thing as a C<Golden Bullet>, which is why there are no rules, only guidelines. |
| 58 | |
| 59 | It is clear that inline code is going to be faster than subroutine or method |
| 60 | calls, because there is less overhead, but this approach has the disadvantage |
| 61 | of being less maintainable and comes at the cost of greater memory usage - |
| 62 | there is no such thing as a free lunch. If you are searching for an element in |
| 63 | a list, it can be more efficient to store the data in a hash structure, and |
| 64 | then simply look to see whether the key is defined, rather than to loop through |
| 65 | the entire array using grep() for instance. substr() may be (a lot) faster |
| 66 | than grep() but not as flexible, so you have another trade-off to access. Your |
| 67 | code may contain a line which takes 0.01 of a second to execute which if you |
| 68 | call it 1,000 times, quite likely in a program parsing even medium sized files |
| 69 | for instance, you already have a 10 second delay, in just one single code |
| 70 | location, and if you call that line 100,000 times, your entire program will |
| 71 | slow down to an unbearable crawl. |
| 72 | |
| 73 | Using a subroutine as part of your sort is a powerful way to get exactly what |
| 74 | you want, but will usually be slower than the built-in I<alphabetic> C<cmp> and |
| 75 | I<numeric> C<E<lt>=E<gt>> sort operators. It is possible to make multiple |
| 76 | passes over your data, building indices to make the upcoming sort more |
| 77 | efficient, and to use what is known as the C<OM> (Orcish Maneuver) to cache the |
| 78 | sort keys in advance. The cache lookup, while a good idea, can itself be a |
| 79 | source of slowdown by enforcing a double pass over the data - once to setup the |
| 80 | cache, and once to sort the data. Using C<pack()> to extract the required sort |
| 81 | key into a consistent string can be an efficient way to build a single string |
| 82 | to compare, instead of using multiple sort keys, which makes it possible to use |
| 83 | the standard, written in C<c> and fast, perl C<sort()> function on the output, |
| 84 | and is the basis of the C<GRT> (Guttman Rossler Transform). Some string |
| 85 | combinations can slow the C<GRT> down, by just being too plain complex for its |
| 86 | own good. |
| 87 | |
| 88 | For applications using database backends, the standard C<DBIx> namespace has |
| 89 | tries to help with keeping things nippy, not least because it tries to I<not> |
| 90 | query the database until the latest possible moment, but always read the docs |
| 91 | which come with your choice of libraries. Among the many issues facing |
| 92 | developers dealing with databases should remain aware of is to always use |
| 93 | C<SQL> placeholders and to consider pre-fetching data sets when this might |
| 94 | prove advantageous. Splitting up a large file by assigning multiple processes |
| 95 | to parsing a single file, using say C<POE>, C<threads> or C<fork> can also be a |
| 96 | useful way of optimizing your usage of the available C<CPU> resources, though |
| 97 | this technique is fraught with concurrency issues and demands high attention to |
| 98 | detail. |
| 99 | |
| 100 | Every case has a specific application and one or more exceptions, and there is |
| 101 | no replacement for running a few tests and finding out which method works best |
| 102 | for your particular environment, this is why writing optimal code is not an |
| 103 | exact science, and why we love using Perl so much - TMTOWTDI. |
| 104 | |
| 105 | =head1 BENCHMARKS |
| 106 | |
| 107 | Here are a few examples to demonstrate usage of Perl's benchmarking tools. |
| 108 | |
| 109 | =head2 Assigning and Dereferencing Variables. |
| 110 | |
| 111 | I'm sure most of us have seen code which looks like, (or worse than), this: |
| 112 | |
| 113 | if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) { |
| 114 | ... |
| 115 | |
| 116 | This sort of code can be a real eyesore to read, as well as being very |
| 117 | sensitive to typos, and it's much clearer to dereference the variable |
| 118 | explicitly. We're side-stepping the issue of working with object-oriented |
| 119 | programming techniques to encapsulate variable access via methods, only |
| 120 | accessible through an object. Here we're just discussing the technical |
| 121 | implementation of choice, and whether this has an effect on performance. We |
| 122 | can see whether this dereferencing operation, has any overhead by putting |
| 123 | comparative code in a file and running a C<Benchmark> test. |
| 124 | |
| 125 | # dereference |
| 126 | |
| 127 | #!/usr/bin/perl |
| 128 | |
| 129 | use strict; |
| 130 | use warnings; |
| 131 | |
| 132 | use Benchmark; |
| 133 | |
| 134 | my $ref = { |
| 135 | 'ref' => { |
| 136 | _myscore => '100 + 1', |
| 137 | _yourscore => '102 - 1', |
| 138 | }, |
| 139 | }; |
| 140 | |
| 141 | timethese(1000000, { |
| 142 | 'direct' => sub { |
| 143 | my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ; |
| 144 | }, |
| 145 | 'dereference' => sub { |
| 146 | my $ref = $ref->{ref}; |
| 147 | my $myscore = $ref->{_myscore}; |
| 148 | my $yourscore = $ref->{_yourscore}; |
| 149 | my $x = $myscore . $yourscore; |
| 150 | }, |
| 151 | }); |
| 152 | |
| 153 | It's essential to run any timing measurements a sufficient number of times so |
| 154 | the numbers settle on a numerical average, otherwise each run will naturally |
| 155 | fluctuate due to variations in the environment, to reduce the effect of |
| 156 | contention for C<CPU> resources and network bandwidth for instance. Running |
| 157 | the above code for one million iterations, we can take a look at the report |
| 158 | output by the C<Benchmark> module, to see which approach is the most effective. |
| 159 | |
| 160 | $> perl dereference |
| 161 | |
| 162 | Benchmark: timing 1000000 iterations of dereference, direct... |
| 163 | dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000) |
| 164 | direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000) |
| 165 | |
| 166 | The difference is clear to see and the dereferencing approach is slower. While |
| 167 | it managed to execute an average of 628,930 times a second during our test, the |
| 168 | direct approach managed to run an additional 204,403 times, unfortunately. |
| 169 | Unfortunately, because there are many examples of code written using the |
| 170 | multiple layer direct variable access, and it's usually horrible. It is, |
| 171 | however, minusculy faster. The question remains whether the minute gain is |
| 172 | actually worth the eyestrain, or the loss of maintainability. |
| 173 | |
| 174 | =head2 Search and replace or tr |
| 175 | |
| 176 | If we have a string which needs to be modified, while a regex will almost |
| 177 | always be much more flexible, C<tr>, an oft underused tool, can still be a |
| 178 | useful. One scenario might be replace all vowels with another character. The |
| 179 | regex solution might look like this: |
| 180 | |
| 181 | $str =~ s/[aeiou]/x/g |
| 182 | |
| 183 | The C<tr> alternative might look like this: |
| 184 | |
| 185 | $str =~ tr/aeiou/xxxxx/ |
| 186 | |
| 187 | We can put that into a test file which we can run to check which approach is |
| 188 | the fastest, using a global C<$STR> variable to assign to the C<my $str> |
| 189 | variable so as to avoid perl trying to optimize any of the work away by |
| 190 | noticing it's assigned only the once. |
| 191 | |
| 192 | # regex-transliterate |
| 193 | |
| 194 | #!/usr/bin/perl |
| 195 | |
| 196 | use strict; |
| 197 | use warnings; |
| 198 | |
| 199 | use Benchmark; |
| 200 | |
| 201 | my $STR = "$$-this and that"; |
| 202 | |
| 203 | timethese( 1000000, { |
| 204 | 'sr' => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; }, |
| 205 | 'tr' => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; }, |
| 206 | }); |
| 207 | |
| 208 | Running the code gives us our results: |
| 209 | |
| 210 | $> perl regex-transliterate |
| 211 | |
| 212 | Benchmark: timing 1000000 iterations of sr, tr... |
| 213 | sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000) |
| 214 | tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000) |
| 215 | |
| 216 | The C<tr> version is a clear winner. One solution is flexible, the other is |
| 217 | fast - and it's appropriately the programmer's choice which to use. |
| 218 | |
| 219 | Check the C<Benchmark> docs for further useful techniques. |
| 220 | |
| 221 | =head1 PROFILING TOOLS |
| 222 | |
| 223 | A slightly larger piece of code will provide something on which a profiler can |
| 224 | produce more extensive reporting statistics. This example uses the simplistic |
| 225 | C<wordmatch> program which parses a given input file and spews out a short |
| 226 | report on the contents. |
| 227 | |
| 228 | # wordmatch |
| 229 | |
| 230 | #!/usr/bin/perl |
| 231 | |
| 232 | use strict; |
| 233 | use warnings; |
| 234 | |
| 235 | =head1 NAME |
| 236 | |
| 237 | filewords - word analysis of input file |
| 238 | |
| 239 | =head1 SYNOPSIS |
| 240 | |
| 241 | filewords -f inputfilename [-d] |
| 242 | |
| 243 | =head1 DESCRIPTION |
| 244 | |
| 245 | This program parses the given filename, specified with C<-f>, and |
| 246 | displays a simple analysis of the words found therein. Use the C<-d> |
| 247 | switch to enable debugging messages. |
| 248 | |
| 249 | =cut |
| 250 | |
| 251 | use FileHandle; |
| 252 | use Getopt::Long; |
| 253 | |
| 254 | my $debug = 0; |
| 255 | my $file = ''; |
| 256 | |
| 257 | my $result = GetOptions ( |
| 258 | 'debug' => \$debug, |
| 259 | 'file=s' => \$file, |
| 260 | ); |
| 261 | die("invalid args") unless $result; |
| 262 | |
| 263 | unless ( -f $file ) { |
| 264 | die("Usage: $0 -f filename [-d]"); |
| 265 | } |
| 266 | my $FH = FileHandle->new("< $file") |
| 267 | or die("unable to open file($file): $!"); |
| 268 | |
| 269 | my $i_LINES = 0; |
| 270 | my $i_WORDS = 0; |
| 271 | my %count = (); |
| 272 | |
| 273 | my @lines = <$FH>; |
| 274 | foreach my $line ( @lines ) { |
| 275 | $i_LINES++; |
| 276 | $line =~ s/\n//; |
| 277 | my @words = split(/ +/, $line); |
| 278 | my $i_words = scalar(@words); |
| 279 | $i_WORDS = $i_WORDS + $i_words; |
| 280 | debug("line: $i_LINES supplying $i_words words: @words"); |
| 281 | my $i_word = 0; |
| 282 | foreach my $word ( @words ) { |
| 283 | $i_word++; |
| 284 | $count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]'); |
| 285 | $count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$'); |
| 286 | $count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$'); |
| 287 | $count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$'); |
| 288 | $count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$'); |
| 289 | } |
| 290 | } |
| 291 | |
| 292 | print report( %count ); |
| 293 | |
| 294 | sub matches { |
| 295 | my $i_wd = shift; |
| 296 | my $word = shift; |
| 297 | my $regex = shift; |
| 298 | my $has = 0; |
| 299 | |
| 300 | if ( $word =~ /($regex)/ ) { |
| 301 | $has++ if $1; |
| 302 | } |
| 303 | |
| 304 | debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/"); |
| 305 | |
| 306 | return $has; |
| 307 | } |
| 308 | |
| 309 | sub report { |
| 310 | my %report = @_; |
| 311 | my %rep; |
| 312 | |
| 313 | foreach my $line ( keys %report ) { |
| 314 | foreach my $key ( keys %{ $report{$line} } ) { |
| 315 | $rep{$key} += $report{$line}{$key}; |
| 316 | } |
| 317 | } |
| 318 | |
| 319 | my $report = qq| |
| 320 | $0 report for $file: |
| 321 | lines in file: $i_LINES |
| 322 | words in file: $i_WORDS |
| 323 | words with special (non-word) characters: $i_spec |
| 324 | words with only special (non-word) characters: $i_only |
| 325 | words with only consonants: $i_cons |
| 326 | words with only capital letters: $i_caps |
| 327 | words with only vowels: $i_vows |
| 328 | |; |
| 329 | |
| 330 | return $report; |
| 331 | } |
| 332 | |
| 333 | sub debug { |
| 334 | my $message = shift; |
| 335 | |
| 336 | if ( $debug ) { |
| 337 | print STDERR "DBG: $message\n"; |
| 338 | } |
| 339 | } |
| 340 | |
| 341 | exit 0; |
| 342 | |
| 343 | =head2 Devel::DProf |
| 344 | |
| 345 | This venerable module has been the de-facto standard for Perl code profiling |
| 346 | for more than a decade, but has been replaced by a number of other modules |
| 347 | which have brought us back to the 21st century. Although you're recommended to |
| 348 | evaluate your tool from the several mentioned here and from the CPAN list at |
| 349 | the base of this document, (and currently L<Devel::NYTProf> seems to be the |
| 350 | weapon of choice - see below), we'll take a quick look at the output from |
| 351 | L<Devel::DProf> first, to set a baseline for Perl profiling tools. Run the |
| 352 | above program under the control of C<Devel::DProf> by using the C<-d> switch on |
| 353 | the command-line. |
| 354 | |
| 355 | $> perl -d:DProf wordmatch -f perl5db.pl |
| 356 | |
| 357 | <...multiple lines snipped...> |
| 358 | |
| 359 | wordmatch report for perl5db.pl: |
| 360 | lines in file: 9428 |
| 361 | words in file: 50243 |
| 362 | words with special (non-word) characters: 20480 |
| 363 | words with only special (non-word) characters: 7790 |
| 364 | words with only consonants: 4801 |
| 365 | words with only capital letters: 1316 |
| 366 | words with only vowels: 1701 |
| 367 | |
| 368 | C<Devel::DProf> produces a special file, called F<tmon.out> by default, and |
| 369 | this file is read by the C<dprofpp> program, which is already installed as part |
| 370 | of the C<Devel::DProf> distribution. If you call C<dprofpp> with no options, |
| 371 | it will read the F<tmon.out> file in the current directory and produce a human |
| 372 | readable statistics report of the run of your program. Note that this may take |
| 373 | a little time. |
| 374 | |
| 375 | $> dprofpp |
| 376 | |
| 377 | Total Elapsed Time = 2.951677 Seconds |
| 378 | User+System Time = 2.871677 Seconds |
| 379 | Exclusive Times |
| 380 | %Time ExclSec CumulS #Calls sec/call Csec/c Name |
| 381 | 102. 2.945 3.003 251215 0.0000 0.0000 main::matches |
| 382 | 2.40 0.069 0.069 260643 0.0000 0.0000 main::debug |
| 383 | 1.74 0.050 0.050 1 0.0500 0.0500 main::report |
| 384 | 1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN |
| 385 | 0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy |
| 386 | 0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN |
| 387 | 0.00 - -0.000 1 - - Getopt::Long::FindOption |
| 388 | 0.00 - -0.000 1 - - Symbol::BEGIN |
| 389 | 0.00 - -0.000 1 - - Fcntl::BEGIN |
| 390 | 0.00 - -0.000 1 - - Fcntl::bootstrap |
| 391 | 0.00 - -0.000 1 - - warnings::BEGIN |
| 392 | 0.00 - -0.000 1 - - IO::bootstrap |
| 393 | 0.00 - -0.000 1 - - Getopt::Long::ConfigDefaults |
| 394 | 0.00 - -0.000 1 - - Getopt::Long::Configure |
| 395 | 0.00 - -0.000 1 - - Symbol::gensym |
| 396 | |
| 397 | C<dprofpp> will produce some quite detailed reporting on the activity of the |
| 398 | C<wordmatch> program. The wallclock, user and system, times are at the top of |
| 399 | the analysis, and after this are the main columns defining which define the |
| 400 | report. Check the C<dprofpp> docs for details of the many options it supports. |
| 401 | |
| 402 | See also C<Apache::DProf> which hooks C<Devel::DProf> into C<mod_perl>. |
| 403 | |
| 404 | =head2 Devel::Profiler |
| 405 | |
| 406 | Let's take a look at the same program using a different profiler: |
| 407 | C<Devel::Profiler>, a drop-in Perl-only replacement for C<Devel::DProf>. The |
| 408 | usage is very slightly different in that instead of using the special C<-d:> |
| 409 | flag, you pull C<Devel::Profiler> in directly as a module using C<-M>. |
| 410 | |
| 411 | $> perl -MDevel::Profiler wordmatch -f perl5db.pl |
| 412 | |
| 413 | <...multiple lines snipped...> |
| 414 | |
| 415 | wordmatch report for perl5db.pl: |
| 416 | lines in file: 9428 |
| 417 | words in file: 50243 |
| 418 | words with special (non-word) characters: 20480 |
| 419 | words with only special (non-word) characters: 7790 |
| 420 | words with only consonants: 4801 |
| 421 | words with only capital letters: 1316 |
| 422 | words with only vowels: 1701 |
| 423 | |
| 424 | |
| 425 | C<Devel::Profiler> generates a tmon.out file which is compatible with the |
| 426 | C<dprofpp> program, thus saving the construction of a dedicated statistics |
| 427 | reader program. C<dprofpp> usage is therefore identical to the above example. |
| 428 | |
| 429 | $> dprofpp |
| 430 | |
| 431 | Total Elapsed Time = 20.984 Seconds |
| 432 | User+System Time = 19.981 Seconds |
| 433 | Exclusive Times |
| 434 | %Time ExclSec CumulS #Calls sec/call Csec/c Name |
| 435 | 49.0 9.792 14.509 251215 0.0000 0.0001 main::matches |
| 436 | 24.4 4.887 4.887 260643 0.0000 0.0000 main::debug |
| 437 | 0.25 0.049 0.049 1 0.0490 0.0490 main::report |
| 438 | 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions |
| 439 | 0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec |
| 440 | 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption |
| 441 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new |
| 442 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new |
| 443 | 0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym |
| 444 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open |
| 445 | |
| 446 | Interestingly we get slightly different results, which is mostly because the |
| 447 | algorithm which generates the report is different, even though the output file |
| 448 | format was allegedly identical. The elapsed, user and system times are clearly |
| 449 | showing the time it took for C<Devel::Profiler> to execute its own run, but |
| 450 | the column listings feel more accurate somehow than the ones we had earlier |
| 451 | from C<Devel::DProf>. The 102% figure has disappeared, for example. This is |
| 452 | where we have to use the tools at our disposal, and recognise their pros and |
| 453 | cons, before using them. Interestingly, the numbers of calls for each |
| 454 | subroutine are identical in the two reports, it's the percentages which differ. |
| 455 | As the author of C<Devel::Proviler> writes: |
| 456 | |
| 457 | ...running HTML::Template's test suite under Devel::DProf shows |
| 458 | output() taking NO time but Devel::Profiler shows around 10% of the |
| 459 | time is in output(). I don't know which to trust but my gut tells me |
| 460 | something is wrong with Devel::DProf. HTML::Template::output() is a |
| 461 | big routine that's called for every test. Either way, something needs |
| 462 | fixing. |
| 463 | |
| 464 | YMMV. |
| 465 | |
| 466 | See also C<Devel::Apache::Profiler> which hooks C<Devel::Profiler> into C<mod_perl>. |
| 467 | |
| 468 | =head2 Devel::SmallProf |
| 469 | |
| 470 | The C<Devel::SmallProf> profiler examines the runtime of your Perl program and |
| 471 | produces a line-by-line listing to show how many times each line was called, |
| 472 | and how long each line took to execute. It is called by supplying the familiar |
| 473 | C<-d> flag to Perl at runtime. |
| 474 | |
| 475 | $> perl -d:SmallProf wordmatch -f perl5db.pl |
| 476 | |
| 477 | <...multiple lines snipped...> |
| 478 | |
| 479 | wordmatch report for perl5db.pl: |
| 480 | lines in file: 9428 |
| 481 | words in file: 50243 |
| 482 | words with special (non-word) characters: 20480 |
| 483 | words with only special (non-word) characters: 7790 |
| 484 | words with only consonants: 4801 |
| 485 | words with only capital letters: 1316 |
| 486 | words with only vowels: 1701 |
| 487 | |
| 488 | C<Devel::SmallProf> writes it's output into a file called F<smallprof.out>, by |
| 489 | default. The format of the file looks like this: |
| 490 | |
| 491 | <num> <time> <ctime> <line>:<text> |
| 492 | |
| 493 | When the program has terminated, the output may be examined and sorted using |
| 494 | any standard text filtering utilities. Something like the following may be |
| 495 | sufficient: |
| 496 | |
| 497 | $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20 |
| 498 | |
| 499 | 251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) { |
| 500 | 251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? 'matches' : |
| 501 | 251215 0.02693 4.10000 81: return $has; |
| 502 | 260643 0.02841 4.07000 128: if ( $debug ) { |
| 503 | 260643 0.02601 4.04000 126: my $message = shift; |
| 504 | 251215 0.02641 3.91000 73: my $has = 0; |
| 505 | 251215 0.03311 3.71000 70: my $i_wd = shift; |
| 506 | 251215 0.02699 3.69000 72: my $regex = shift; |
| 507 | 251215 0.02766 3.68000 71: my $word = shift; |
| 508 | 50243 0.59726 1.00000 59: $count{$i_LINES}{cons} = |
| 509 | 50243 0.48175 0.92000 61: $count{$i_LINES}{spec} = |
| 510 | 50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word, |
| 511 | 50243 0.48837 0.88000 63: $count{$i_LINES}{caps} = |
| 512 | 50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, '^[(A- |
| 513 | 50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, '[^a- |
| 514 | 50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word, |
| 515 | 50243 0.00688 0.80000 53: $i_word++; |
| 516 | 50243 0.48469 0.79000 62: $count{$i_LINES}{only} = |
| 517 | 50243 0.48928 0.77000 60: $count{$i_LINES}{vows} = |
| 518 | 50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, '^[^a- |
| 519 | |
| 520 | You can immediately see a slightly different focus to the subroutine profiling |
| 521 | modules, and we start to see exactly which line of code is taking the most |
| 522 | time. That regex line is looking a bit suspicious, for example. Remember that |
| 523 | these tools are supposed to be used together, there is no single best way to |
| 524 | profile your code, you need to use the best tools for the job. |
| 525 | |
| 526 | See also C<Apache::SmallProf> which hooks C<Devel::SmallProf> into C<mod_perl>. |
| 527 | |
| 528 | =head2 Devel::FastProf |
| 529 | |
| 530 | C<Devel::FastProf> is another Perl line profiler. This was written with a view |
| 531 | to getting a faster line profiler, than is possible with for example |
| 532 | C<Devel::SmallProf>, because it's written in C<C>. To use C<Devel::FastProf>, |
| 533 | supply the C<-d> argument to Perl: |
| 534 | |
| 535 | $> perl -d:FastProf wordmatch -f perl5db.pl |
| 536 | |
| 537 | <...multiple lines snipped...> |
| 538 | |
| 539 | wordmatch report for perl5db.pl: |
| 540 | lines in file: 9428 |
| 541 | words in file: 50243 |
| 542 | words with special (non-word) characters: 20480 |
| 543 | words with only special (non-word) characters: 7790 |
| 544 | words with only consonants: 4801 |
| 545 | words with only capital letters: 1316 |
| 546 | words with only vowels: 1701 |
| 547 | |
| 548 | C<Devel::FastProf> writes statistics to the file F<fastprof.out> in the current |
| 549 | directory. The output file, which can be specified, can be interpreted by using |
| 550 | the C<fprofpp> command-line program. |
| 551 | |
| 552 | $> fprofpp | head -n20 |
| 553 | |
| 554 | # fprofpp output format is: |
| 555 | # filename:line time count: source |
| 556 | wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) { |
| 557 | wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/"); |
| 558 | wordmatch:81 1.47604 251215: return $has; |
| 559 | wordmatch:126 1.43441 260643: my $message = shift; |
| 560 | wordmatch:128 1.42156 260643: if ( $debug ) { |
| 561 | wordmatch:70 1.36824 251215: my $i_wd = shift; |
| 562 | wordmatch:71 1.36739 251215: my $word = shift; |
| 563 | wordmatch:72 1.35939 251215: my $regex = shift; |
| 564 | |
| 565 | Straightaway we can see that the number of times each line has been called is |
| 566 | identical to the C<Devel::SmallProf> output, and the sequence is only very |
| 567 | slightly different based on the ordering of the amount of time each line took |
| 568 | to execute, C<if ( $debug ) { > and C<my $message = shift;>, for example. The |
| 569 | differences in the actual times recorded might be in the algorithm used |
| 570 | internally, or it could be due to system resource limitations or contention. |
| 571 | |
| 572 | See also the L<DBIx::Profile> which will profile database queries running |
| 573 | under the C<DBIx::*> namespace. |
| 574 | |
| 575 | =head2 Devel::NYTProf |
| 576 | |
| 577 | C<Devel::NYTProf> is the B<next generation> of Perl code profiler, fixing many |
| 578 | shortcomings in other tools and implementing many cool features. First of all it |
| 579 | can be used as either a I<line> profiler, a I<block> or a I<subroutine> |
| 580 | profiler, all at once. It can also use sub-microsecond (100ns) resolution on |
| 581 | systems which provide C<clock_gettime()>. It can be started and stopped even |
| 582 | by the program being profiled. It's a one-line entry to profile C<mod_perl> |
| 583 | applications. It's written in C<c> and is probably the fastest profiler |
| 584 | available for Perl. The list of coolness just goes on. Enough of that, let's |
| 585 | see how to it works - just use the familiar C<-d> switch to plug it in and run |
| 586 | the code. |
| 587 | |
| 588 | $> perl -d:NYTProf wordmatch -f perl5db.pl |
| 589 | |
| 590 | wordmatch report for perl5db.pl: |
| 591 | lines in file: 9427 |
| 592 | words in file: 50243 |
| 593 | words with special (non-word) characters: 20480 |
| 594 | words with only special (non-word) characters: 7790 |
| 595 | words with only consonants: 4801 |
| 596 | words with only capital letters: 1316 |
| 597 | words with only vowels: 1701 |
| 598 | |
| 599 | C<NYTProf> will generate a report database into the file F<nytprof.out> by |
| 600 | default. Human readable reports can be generated from here by using the |
| 601 | supplied C<nytprofhtml> (HTML output) and C<nytprofcsv> (CSV output) programs. |
| 602 | We've used the Unix system C<html2text> utility to convert the |
| 603 | F<nytprof/index.html> file for convenience here. |
| 604 | |
| 605 | $> html2text nytprof/index.html |
| 606 | |
| 607 | Performance Profile Index |
| 608 | For wordmatch |
| 609 | Run on Fri Sep 26 13:46:39 2008 |
| 610 | Reported on Fri Sep 26 13:47:23 2008 |
| 611 | |
| 612 | Top 15 Subroutines -- ordered by exclusive time |
| 613 | |Calls |P |F |Inclusive|Exclusive|Subroutine | |
| 614 | | | | |Time |Time | | |
| 615 | |251215|5 |1 |13.09263 |10.47692 |main:: |matches | |
| 616 | |260642|2 |1 |2.71199 |2.71199 |main:: |debug | |
| 617 | |1 |1 |1 |0.21404 |0.21404 |main:: |report | |
| 618 | |2 |2 |2 |0.00511 |0.00511 |XSLoader:: |load (xsub) | |
| 619 | |14 |14|7 |0.00304 |0.00298 |Exporter:: |import | |
| 620 | |3 |1 |1 |0.00265 |0.00254 |Exporter:: |as_heavy | |
| 621 | |10 |10|4 |0.00140 |0.00140 |vars:: |import | |
| 622 | |13 |13|1 |0.00129 |0.00109 |constant:: |import | |
| 623 | |1 |1 |1 |0.00360 |0.00096 |FileHandle:: |import | |
| 624 | |3 |3 |3 |0.00086 |0.00074 |warnings::register::|import | |
| 625 | |9 |3 |1 |0.00036 |0.00036 |strict:: |bits | |
| 626 | |13 |13|13|0.00032 |0.00029 |strict:: |import | |
| 627 | |2 |2 |2 |0.00020 |0.00020 |warnings:: |import | |
| 628 | |2 |1 |1 |0.00020 |0.00020 |Getopt::Long:: |ParseOptionSpec| |
| 629 | |7 |7 |6 |0.00043 |0.00020 |strict:: |unimport | |
| 630 | |
| 631 | For more information see the full list of 189 subroutines. |
| 632 | |
| 633 | The first part of the report already shows the critical information regarding |
| 634 | which subroutines are using the most time. The next gives some statistics |
| 635 | about the source files profiled. |
| 636 | |
| 637 | Source Code Files -- ordered by exclusive time then name |
| 638 | |Stmts |Exclusive|Avg. |Reports |Source File | |
| 639 | | |Time | | | | |
| 640 | |2699761|15.66654 |6e-06 |line . block . sub|wordmatch | |
| 641 | |35 |0.02187 |0.00062|line . block . sub|IO/Handle.pm | |
| 642 | |274 |0.01525 |0.00006|line . block . sub|Getopt/Long.pm | |
| 643 | |20 |0.00585 |0.00029|line . block . sub|Fcntl.pm | |
| 644 | |128 |0.00340 |0.00003|line . block . sub|Exporter/Heavy.pm | |
| 645 | |42 |0.00332 |0.00008|line . block . sub|IO/File.pm | |
| 646 | |261 |0.00308 |0.00001|line . block . sub|Exporter.pm | |
| 647 | |323 |0.00248 |8e-06 |line . block . sub|constant.pm | |
| 648 | |12 |0.00246 |0.00021|line . block . sub|File/Spec/Unix.pm | |
| 649 | |191 |0.00240 |0.00001|line . block . sub|vars.pm | |
| 650 | |77 |0.00201 |0.00003|line . block . sub|FileHandle.pm | |
| 651 | |12 |0.00198 |0.00016|line . block . sub|Carp.pm | |
| 652 | |14 |0.00175 |0.00013|line . block . sub|Symbol.pm | |
| 653 | |15 |0.00130 |0.00009|line . block . sub|IO.pm | |
| 654 | |22 |0.00120 |0.00005|line . block . sub|IO/Seekable.pm | |
| 655 | |198 |0.00085 |4e-06 |line . block . sub|warnings/register.pm| |
| 656 | |114 |0.00080 |7e-06 |line . block . sub|strict.pm | |
| 657 | |47 |0.00068 |0.00001|line . block . sub|warnings.pm | |
| 658 | |27 |0.00054 |0.00002|line . block . sub|overload.pm | |
| 659 | |9 |0.00047 |0.00005|line . block . sub|SelectSaver.pm | |
| 660 | |13 |0.00045 |0.00003|line . block . sub|File/Spec.pm | |
| 661 | |2701595|15.73869 | |Total | |
| 662 | |128647 |0.74946 | |Average | |
| 663 | | |0.00201 |0.00003|Median | |
| 664 | | |0.00121 |0.00003|Deviation | |
| 665 | |
| 666 | Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and |
| 667 | Adam Kaplan. |
| 668 | |
| 669 | At this point, if you're using the I<html> report, you can click through the |
| 670 | various links to bore down into each subroutine and each line of code. Because |
| 671 | we're using the text reporting here, and there's a whole directory full of |
| 672 | reports built for each source file, we'll just display a part of the |
| 673 | corresponding F<wordmatch-line.html> file, sufficient to give an idea of the |
| 674 | sort of output you can expect from this cool tool. |
| 675 | |
| 676 | $> html2text nytprof/wordmatch-line.html |
| 677 | |
| 678 | Performance Profile -- -block view-.-line view-.-sub view- |
| 679 | For wordmatch |
| 680 | Run on Fri Sep 26 13:46:39 2008 |
| 681 | Reported on Fri Sep 26 13:47:22 2008 |
| 682 | |
| 683 | File wordmatch |
| 684 | |
| 685 | Subroutines -- ordered by exclusive time |
| 686 | |Calls |P|F|Inclusive|Exclusive|Subroutine | |
| 687 | | | | |Time |Time | | |
| 688 | |251215|5|1|13.09263 |10.47692 |main::|matches| |
| 689 | |260642|2|1|2.71199 |2.71199 |main::|debug | |
| 690 | |1 |1|1|0.21404 |0.21404 |main::|report | |
| 691 | |0 |0|0|0 |0 |main::|BEGIN | |
| 692 | |
| 693 | |
| 694 | |Line|Stmts.|Exclusive|Avg. |Code | |
| 695 | | | |Time | | | |
| 696 | |1 | | | |#!/usr/bin/perl | |
| 697 | |2 | | | | | |
| 698 | | | | | |use strict; | |
| 699 | |3 |3 |0.00086 |0.00029|# spent 0.00003s making 1 calls to strict:: | |
| 700 | | | | | |import | |
| 701 | | | | | |use warnings; | |
| 702 | |4 |3 |0.01563 |0.00521|# spent 0.00012s making 1 calls to warnings:: | |
| 703 | | | | | |import | |
| 704 | |5 | | | | | |
| 705 | |6 | | | |=head1 NAME | |
| 706 | |7 | | | | | |
| 707 | |8 | | | |filewords - word analysis of input file | |
| 708 | <...snip...> |
| 709 | |62 |1 |0.00445 |0.00445|print report( %count ); | |
| 710 | | | | | |# spent 0.21404s making 1 calls to main::report| |
| 711 | |63 | | | | | |
| 712 | | | | | |# spent 23.56955s (10.47692+2.61571) within | |
| 713 | | | | | |main::matches which was called 251215 times, | |
| 714 | | | | | |avg 0.00005s/call: # 50243 times | |
| 715 | | | | | |(2.12134+0.51939s) at line 57 of wordmatch, avg| |
| 716 | | | | | |0.00005s/call # 50243 times (2.17735+0.54550s) | |
| 717 | |64 | | | |at line 56 of wordmatch, avg 0.00005s/call # | |
| 718 | | | | | |50243 times (2.10992+0.51797s) at line 58 of | |
| 719 | | | | | |wordmatch, avg 0.00005s/call # 50243 times | |
| 720 | | | | | |(2.12696+0.51598s) at line 55 of wordmatch, avg| |
| 721 | | | | | |0.00005s/call # 50243 times (1.94134+0.51687s) | |
| 722 | | | | | |at line 54 of wordmatch, avg 0.00005s/call | |
| 723 | | | | | |sub matches { | |
| 724 | <...snip...> |
| 725 | |102 | | | | | |
| 726 | | | | | |# spent 2.71199s within main::debug which was | |
| 727 | | | | | |called 260642 times, avg 0.00001s/call: # | |
| 728 | | | | | |251215 times (2.61571+0s) by main::matches at | |
| 729 | |103 | | | |line 74 of wordmatch, avg 0.00001s/call # 9427 | |
| 730 | | | | | |times (0.09628+0s) at line 50 of wordmatch, avg| |
| 731 | | | | | |0.00001s/call | |
| 732 | | | | | |sub debug { | |
| 733 | |104 |260642|0.58496 |2e-06 |my $message = shift; | |
| 734 | |105 | | | | | |
| 735 | |106 |260642|1.09917 |4e-06 |if ( $debug ) { | |
| 736 | |107 | | | |print STDERR "DBG: $message\n"; | |
| 737 | |108 | | | |} | |
| 738 | |109 | | | |} | |
| 739 | |110 | | | | | |
| 740 | |111 |1 |0.01501 |0.01501|exit 0; | |
| 741 | |112 | | | | | |
| 742 | |
| 743 | Oodles of very useful information in there - this seems to be the way forward. |
| 744 | |
| 745 | See also C<Devel::NYTProf::Apache> which hooks C<Devel::NYTProf> into C<mod_perl>. |
| 746 | |
| 747 | =head1 SORTING |
| 748 | |
| 749 | Perl modules are not the only tools a performance analyst has at their |
| 750 | disposal, system tools like C<time> should not be overlooked as the next |
| 751 | example shows, where we take a quick look at sorting. Many books, theses and |
| 752 | articles, have been written about efficient sorting algorithms, and this is not |
| 753 | the place to repeat such work, there's several good sorting modules which |
| 754 | deserve taking a look at too: C<Sort::Maker>, C<Sort::Key> spring to mind. |
| 755 | However, it's still possible to make some observations on certain Perl specific |
| 756 | interpretations on issues relating to sorting data sets and give an example or |
| 757 | two with regard to how sorting large data volumes can effect performance. |
| 758 | Firstly, an often overlooked point when sorting large amounts of data, one can |
| 759 | attempt to reduce the data set to be dealt with and in many cases C<grep()> can |
| 760 | be quite useful as a simple filter: |
| 761 | |
| 762 | @data = sort grep { /$filter/ } @incoming |
| 763 | |
| 764 | A command such as this can vastly reduce the volume of material to actually |
| 765 | sort through in the first place, and should not be too lightly disregarded |
| 766 | purely on the basis of its simplicity. The C<KISS> principle is too often |
| 767 | overlooked - the next example uses the simple system C<time> utility to |
| 768 | demonstrate. Let's take a look at an actual example of sorting the contents of |
| 769 | a large file, an apache logfile would do. This one has over a quarter of a |
| 770 | million lines, is 50M in size, and a snippet of it looks like this: |
| 771 | |
| 772 | # logfile |
| 773 | |
| 774 | 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 775 | 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 776 | 151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" |
| 777 | 151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" |
| 778 | 151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" |
| 779 | 217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 780 | 217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 781 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" |
| 782 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" |
| 783 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" |
| 784 | 195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" |
| 785 | 195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" |
| 786 | 195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" |
| 787 | crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0" |
| 788 | crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0" |
| 789 | fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 790 | fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
| 791 | 80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" |
| 792 | 80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" |
| 793 | pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net" |
| 794 | livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" |
| 795 | livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" |
| 796 | dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net" |
| 797 | 196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1" |
| 798 | |
| 799 | The specific task here is to sort the 286,525 lines of this file by Response |
| 800 | Code, Query, Browser, Referring Url, and lastly Date. One solution might be to |
| 801 | use the following code, which iterates over the files given on the |
| 802 | command-line. |
| 803 | |
| 804 | # sort-apache-log |
| 805 | |
| 806 | #!/usr/bin/perl -n |
| 807 | |
| 808 | use strict; |
| 809 | use warnings; |
| 810 | |
| 811 | my @data; |
| 812 | |
| 813 | LINE: |
| 814 | while ( <> ) { |
| 815 | my $line = $_; |
| 816 | if ( |
| 817 | $line =~ m/^( |
| 818 | ([\w\.\-]+) # client |
| 819 | \s*-\s*-\s*\[ |
| 820 | ([^]]+) # date |
| 821 | \]\s*"\w+\s* |
| 822 | (\S+) # query |
| 823 | [^"]+"\s* |
| 824 | (\d+) # status |
| 825 | \s+\S+\s+"[^"]*"\s+" |
| 826 | ([^"]*) # browser |
| 827 | " |
| 828 | .* |
| 829 | )$/x |
| 830 | ) { |
| 831 | my @chunks = split(/ +/, $line); |
| 832 | my $ip = $1; |
| 833 | my $date = $2; |
| 834 | my $query = $3; |
| 835 | my $status = $4; |
| 836 | my $browser = $5; |
| 837 | |
| 838 | push(@data, [$ip, $date, $query, $status, $browser, $line]); |
| 839 | } |
| 840 | } |
| 841 | |
| 842 | my @sorted = sort { |
| 843 | $a->[3] cmp $b->[3] |
| 844 | || |
| 845 | $a->[2] cmp $b->[2] |
| 846 | || |
| 847 | $a->[0] cmp $b->[0] |
| 848 | || |
| 849 | $a->[1] cmp $b->[1] |
| 850 | || |
| 851 | $a->[4] cmp $b->[4] |
| 852 | } @data; |
| 853 | |
| 854 | foreach my $data ( @sorted ) { |
| 855 | print $data->[5]; |
| 856 | } |
| 857 | |
| 858 | exit 0; |
| 859 | |
| 860 | When running this program, redirect C<STDOUT> so it is possible to check the |
| 861 | output is correct from following test runs and use the system C<time> utility |
| 862 | to check the overall runtime. |
| 863 | |
| 864 | $> time ./sort-apache-log logfile > out-sort |
| 865 | |
| 866 | real 0m17.371s |
| 867 | user 0m15.757s |
| 868 | sys 0m0.592s |
| 869 | |
| 870 | The program took just over 17 wallclock seconds to run. Note the different |
| 871 | values C<time> outputs, it's important to always use the same one, and to not |
| 872 | confuse what each one means. |
| 873 | |
| 874 | =over 4 |
| 875 | |
| 876 | =item Elapsed Real Time |
| 877 | |
| 878 | The overall, or wallclock, time between when C<time> was called, and when it |
| 879 | terminates. The elapsed time includes both user and system times, and time |
| 880 | spent waiting for other users and processes on the system. Inevitably, this is |
| 881 | the most approximate of the measurements given. |
| 882 | |
| 883 | =item User CPU Time |
| 884 | |
| 885 | The user time is the amount of time the entire process spent on behalf of the |
| 886 | user on this system executing this program. |
| 887 | |
| 888 | =item System CPU Time |
| 889 | |
| 890 | The system time is the amount of time the kernel itself spent executing |
| 891 | routines, or system calls, on behalf of this process user. |
| 892 | |
| 893 | =back |
| 894 | |
| 895 | Running this same process as a C<Schwarzian Transform> it is possible to |
| 896 | eliminate the input and output arrays for storing all the data, and work on the |
| 897 | input directly as it arrives too. Otherwise, the code looks fairly similar: |
| 898 | |
| 899 | # sort-apache-log-schwarzian |
| 900 | |
| 901 | #!/usr/bin/perl -n |
| 902 | |
| 903 | use strict; |
| 904 | use warnings; |
| 905 | |
| 906 | print |
| 907 | |
| 908 | map $_->[0] => |
| 909 | |
| 910 | sort { |
| 911 | $a->[4] cmp $b->[4] |
| 912 | || |
| 913 | $a->[3] cmp $b->[3] |
| 914 | || |
| 915 | $a->[1] cmp $b->[1] |
| 916 | || |
| 917 | $a->[2] cmp $b->[2] |
| 918 | || |
| 919 | $a->[5] cmp $b->[5] |
| 920 | } |
| 921 | map [ $_, m/^( |
| 922 | ([\w\.\-]+) # client |
| 923 | \s*-\s*-\s*\[ |
| 924 | ([^]]+) # date |
| 925 | \]\s*"\w+\s* |
| 926 | (\S+) # query |
| 927 | [^"]+"\s* |
| 928 | (\d+) # status |
| 929 | \s+\S+\s+"[^"]*"\s+" |
| 930 | ([^"]*) # browser |
| 931 | " |
| 932 | .* |
| 933 | )$/xo ] |
| 934 | |
| 935 | => <>; |
| 936 | |
| 937 | exit 0; |
| 938 | |
| 939 | Run the new code against the same logfile, as above, to check the new time. |
| 940 | |
| 941 | $> time ./sort-apache-log-schwarzian logfile > out-schwarz |
| 942 | |
| 943 | real 0m9.664s |
| 944 | user 0m8.873s |
| 945 | sys 0m0.704s |
| 946 | |
| 947 | The time has been cut in half, which is a respectable speed improvement by any |
| 948 | standard. Naturally, it is important to check the output is consistent with |
| 949 | the first program run, this is where the Unix system C<cksum> utility comes in. |
| 950 | |
| 951 | $> cksum out-sort out-schwarz |
| 952 | 3044173777 52029194 out-sort |
| 953 | 3044173777 52029194 out-schwarz |
| 954 | |
| 955 | BTW. Beware too of pressure from managers who see you speed a program up by 50% |
| 956 | of the runtime once, only to get a request one month later to do the same again |
| 957 | (true story) - you'll just have to point out you're only human, even if you are a |
| 958 | Perl programmer, and you'll see what you can do... |
| 959 | |
| 960 | =head1 LOGGING |
| 961 | |
| 962 | An essential part of any good development process is appropriate error handling |
| 963 | with appropriately informative messages, however there exists a school of |
| 964 | thought which suggests that log files should be I<chatty>, as if the chain of |
| 965 | unbroken output somehow ensures the survival of the program. If speed is in |
| 966 | any way an issue, this approach is wrong. |
| 967 | |
| 968 | A common sight is code which looks something like this: |
| 969 | |
| 970 | logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) |
| 971 | |
| 972 | The problem is that this code will always be parsed and executed, even when the |
| 973 | debug level set in the logging configuration file is zero. Once the debug() |
| 974 | subroutine has been entered, and the internal C<$debug> variable confirmed to |
| 975 | be zero, for example, the message which has been sent in will be discarded and |
| 976 | the program will continue. In the example given though, the C<\%INC> hash will |
| 977 | already have been dumped, and the message string constructed, all of which work |
| 978 | could be bypassed by a debug variable at the statement level, like this: |
| 979 | |
| 980 | logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG; |
| 981 | |
| 982 | This effect can be demonstrated by setting up a test script with both forms, |
| 983 | including a C<debug()> subroutine to emulate typical C<logger()> functionality. |
| 984 | |
| 985 | # ifdebug |
| 986 | |
| 987 | #!/usr/bin/perl |
| 988 | |
| 989 | use strict; |
| 990 | use warnings; |
| 991 | |
| 992 | use Benchmark; |
| 993 | use Data::Dumper; |
| 994 | my $DEBUG = 0; |
| 995 | |
| 996 | sub debug { |
| 997 | my $msg = shift; |
| 998 | |
| 999 | if ( $DEBUG ) { |
| 1000 | print "DEBUG: $msg\n"; |
| 1001 | } |
| 1002 | }; |
| 1003 | |
| 1004 | timethese(100000, { |
| 1005 | 'debug' => sub { |
| 1006 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) |
| 1007 | }, |
| 1008 | 'ifdebug' => sub { |
| 1009 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG |
| 1010 | }, |
| 1011 | }); |
| 1012 | |
| 1013 | Let's see what C<Benchmark> makes of this: |
| 1014 | |
| 1015 | $> perl ifdebug |
| 1016 | Benchmark: timing 100000 iterations of constant, sub... |
| 1017 | ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000) |
| 1018 | (warning: too few iterations for a reliable count) |
| 1019 | debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000) |
| 1020 | |
| 1021 | In the one case the code, which does exactly the same thing as far as |
| 1022 | outputting any debugging information is concerned, in other words nothing, |
| 1023 | takes 14 seconds, and in the other case the code takes one hundredth of a |
| 1024 | second. Looks fairly definitive. Use a C<$DEBUG> variable BEFORE you call the |
| 1025 | subroutine, rather than relying on the smart functionality inside it. |
| 1026 | |
| 1027 | =head2 Logging if DEBUG (constant) |
| 1028 | |
| 1029 | It's possible to take the previous idea a little further, by using a compile |
| 1030 | time C<DEBUG> constant. |
| 1031 | |
| 1032 | # ifdebug-constant |
| 1033 | |
| 1034 | #!/usr/bin/perl |
| 1035 | |
| 1036 | use strict; |
| 1037 | use warnings; |
| 1038 | |
| 1039 | use Benchmark; |
| 1040 | use Data::Dumper; |
| 1041 | use constant |
| 1042 | DEBUG => 0 |
| 1043 | ; |
| 1044 | |
| 1045 | sub debug { |
| 1046 | if ( DEBUG ) { |
| 1047 | my $msg = shift; |
| 1048 | print "DEBUG: $msg\n"; |
| 1049 | } |
| 1050 | }; |
| 1051 | |
| 1052 | timethese(100000, { |
| 1053 | 'debug' => sub { |
| 1054 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) |
| 1055 | }, |
| 1056 | 'constant' => sub { |
| 1057 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG |
| 1058 | }, |
| 1059 | }); |
| 1060 | |
| 1061 | Running this program produces the following output: |
| 1062 | |
| 1063 | $> perl ifdebug-constant |
| 1064 | Benchmark: timing 100000 iterations of constant, sub... |
| 1065 | constant: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000) |
| 1066 | (warning: too few iterations for a reliable count) |
| 1067 | sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000) |
| 1068 | |
| 1069 | The C<DEBUG> constant wipes the floor with even the C<$debug> variable, |
| 1070 | clocking in at minus zero seconds, and generates a "warning: too few iterations |
| 1071 | for a reliable count" message into the bargain. To see what is really going |
| 1072 | on, and why we had too few iterations when we thought we asked for 100000, we |
| 1073 | can use the very useful C<B::Deparse> to inspect the new code: |
| 1074 | |
| 1075 | $> perl -MO=Deparse ifdebug-constant |
| 1076 | |
| 1077 | use Benchmark; |
| 1078 | use Data::Dumper; |
| 1079 | use constant ('DEBUG', 0); |
| 1080 | sub debug { |
| 1081 | use warnings; |
| 1082 | use strict 'refs'; |
| 1083 | 0; |
| 1084 | } |
| 1085 | use warnings; |
| 1086 | use strict 'refs'; |
| 1087 | timethese(100000, {'sub', sub { |
| 1088 | debug "A $0 logging message via process-id: $$" . Dumper(\%INC); |
| 1089 | } |
| 1090 | , 'constant', sub { |
| 1091 | 0; |
| 1092 | } |
| 1093 | }); |
| 1094 | ifdebug-constant syntax OK |
| 1095 | |
| 1096 | The output shows the constant() subroutine we're testing being replaced with |
| 1097 | the value of the C<DEBUG> constant: zero. The line to be tested has been |
| 1098 | completely optimized away, and you can't get much more efficient than that. |
| 1099 | |
| 1100 | =head1 POSTSCRIPT |
| 1101 | |
| 1102 | This document has provided several way to go about identifying hot-spots, and |
| 1103 | checking whether any modifications have improved the runtime of the code. |
| 1104 | |
| 1105 | As a final thought, remember that it's not (at the time of writing) possible to |
| 1106 | produce a useful program which will run in zero or negative time and this basic |
| 1107 | principle can be written as: I<useful programs are slow> by their very |
| 1108 | definition. It is of course possible to write a nearly instantaneous program, |
| 1109 | but it's not going to do very much, here's a very efficient one: |
| 1110 | |
| 1111 | $> perl -e 0 |
| 1112 | |
| 1113 | Optimizing that any further is a job for C<p5p>. |
| 1114 | |
| 1115 | =head1 SEE ALSO |
| 1116 | |
| 1117 | Further reading can be found using the modules and links below. |
| 1118 | |
| 1119 | =head2 PERLDOCS |
| 1120 | |
| 1121 | For example: C<perldoc -f sort>. |
| 1122 | |
| 1123 | L<perlfaq4>. |
| 1124 | |
| 1125 | L<perlfork>, L<perlfunc>, L<perlretut>, L<perlthrtut>. |
| 1126 | |
| 1127 | L<threads>. |
| 1128 | |
| 1129 | =head2 MAN PAGES |
| 1130 | |
| 1131 | C<time>. |
| 1132 | |
| 1133 | =head2 MODULES |
| 1134 | |
| 1135 | It's not possible to individually showcase all the performance related code for |
| 1136 | Perl here, naturally, but here's a short list of modules from the CPAN which |
| 1137 | deserve further attention. |
| 1138 | |
| 1139 | Apache::DProf |
| 1140 | Apache::SmallProf |
| 1141 | Benchmark |
| 1142 | DBIx::Profile |
| 1143 | Devel::AutoProfiler |
| 1144 | Devel::DProf |
| 1145 | Devel::DProfLB |
| 1146 | Devel::FastProf |
| 1147 | Devel::GraphVizProf |
| 1148 | Devel::NYTProf |
| 1149 | Devel::NYTProf::Apache |
| 1150 | Devel::Profiler |
| 1151 | Devel::Profile |
| 1152 | Devel::Profit |
| 1153 | Devel::SmallProf |
| 1154 | Devel::WxProf |
| 1155 | POE::Devel::Profiler |
| 1156 | Sort::Key |
| 1157 | Sort::Maker |
| 1158 | |
| 1159 | =head2 URLS |
| 1160 | |
| 1161 | Very useful online reference material: |
| 1162 | |
| 1163 | http://www.ccl4.org/~nick/P/Fast_Enough/ |
| 1164 | |
| 1165 | http://www-128.ibm.com/developerworks/library/l-optperl.html |
| 1166 | |
| 1167 | http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html |
| 1168 | |
| 1169 | http://en.wikipedia.org/wiki/Performance_analysis |
| 1170 | |
| 1171 | http://apache.perl.org/docs/1.0/guide/performance.html |
| 1172 | |
| 1173 | http://perlgolf.sourceforge.net/ |
| 1174 | |
| 1175 | http://www.sysarch.com/Perl/sort_paper.html |
| 1176 | |
| 1177 | =head1 AUTHOR |
| 1178 | |
| 1179 | Richard Foley <richard.foley@rfi.net> Copyright (c) 2008 |
| 1180 | |
| 1181 | =cut |