Commit | Line | Data |
---|---|---|
da096611 RGS |
1 | =head1 NAME |
2 | ||
3 | perlperf - Perl Performance and Optimization Techniques | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | This is an introduction to the use of performance and optimization techniques | |
e4866e2e | 8 | which can be used with particular reference to perl programs. While many perl |
da096611 RGS |
9 | developers have come from other languages, and can use their prior knowledge |
10 | where appropriate, there are many other people who might benefit from a few | |
11 | perl specific pointers. If you want the condensed version, perhaps the best | |
12 | advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said: | |
13 | ||
f185f654 | 14 | "Do Not Engage in Useless Activity" |
da096611 RGS |
15 | |
16 | in 1645. | |
17 | ||
18 | =head1 OVERVIEW | |
19 | ||
20 | Perhaps the most common mistake programmers make is to attempt to optimize | |
21 | their code before a program actually does anything useful - this is a bad idea. | |
22 | There's no point in having an extremely fast program that doesn't work. The | |
23 | first job is to get a program to I<correctly> do something B<useful>, (not to | |
24 | mention ensuring the test suite is fully functional), and only then to consider | |
25 | optimizing it. Having decided to optimize existing working code, there are | |
26 | several simple but essential steps to consider which are intrinsic to any | |
27 | optimization process. | |
28 | ||
29 | =head2 ONE STEP SIDEWAYS | |
30 | ||
31 | Firstly, you need to establish a baseline time for the existing code, which | |
32 | timing needs to be reliable and repeatable. You'll probably want to use the | |
c9dab4e9 | 33 | C<Benchmark> or C<Devel::NYTProf> modules, or something similar, for this step, |
e1020413 | 34 | or perhaps the Unix system C<time> utility, whichever is appropriate. See the |
da096611 RGS |
35 | base of this document for a longer list of benchmarking and profiling modules, |
36 | and recommended further reading. | |
37 | ||
38 | =head2 ONE STEP FORWARD | |
39 | ||
40 | Next, having examined the program for I<hot spots>, (places where the code | |
41 | seems to run slowly), change the code with the intention of making it run | |
42 | faster. Using version control software, like C<subversion>, will ensure no | |
43 | changes are irreversible. It's too easy to fiddle here and fiddle there - | |
44 | don't change too much at any one time or you might not discover which piece of | |
45 | code B<really> was the slow bit. | |
46 | ||
47 | =head2 ANOTHER STEP SIDEWAYS | |
48 | ||
49 | It's not enough to say: "that will make it run faster", you have to check it. | |
50 | Rerun the code under control of the benchmarking or profiling modules, from the | |
51 | first step above, and check that the new code executed the B<same task> in | |
52 | I<less time>. Save your work and repeat... | |
53 | ||
54 | =head1 GENERAL GUIDELINES | |
55 | ||
56 | The critical thing when considering performance is to remember there is no such | |
57 | thing as a C<Golden Bullet>, which is why there are no rules, only guidelines. | |
58 | ||
59 | It is clear that inline code is going to be faster than subroutine or method | |
60 | calls, because there is less overhead, but this approach has the disadvantage | |
61 | of being less maintainable and comes at the cost of greater memory usage - | |
62 | there is no such thing as a free lunch. If you are searching for an element in | |
63 | a list, it can be more efficient to store the data in a hash structure, and | |
64 | then simply look to see whether the key is defined, rather than to loop through | |
65 | the entire array using grep() for instance. substr() may be (a lot) faster | |
66 | than grep() but not as flexible, so you have another trade-off to access. Your | |
67 | code may contain a line which takes 0.01 of a second to execute which if you | |
68 | call it 1,000 times, quite likely in a program parsing even medium sized files | |
69 | for instance, you already have a 10 second delay, in just one single code | |
70 | location, and if you call that line 100,000 times, your entire program will | |
71 | slow down to an unbearable crawl. | |
72 | ||
73 | Using a subroutine as part of your sort is a powerful way to get exactly what | |
74 | you want, but will usually be slower than the built-in I<alphabetic> C<cmp> and | |
75 | I<numeric> C<E<lt>=E<gt>> sort operators. It is possible to make multiple | |
76 | passes over your data, building indices to make the upcoming sort more | |
77 | efficient, and to use what is known as the C<OM> (Orcish Maneuver) to cache the | |
78 | sort keys in advance. The cache lookup, while a good idea, can itself be a | |
79 | source of slowdown by enforcing a double pass over the data - once to setup the | |
80 | cache, and once to sort the data. Using C<pack()> to extract the required sort | |
81 | key into a consistent string can be an efficient way to build a single string | |
82 | to compare, instead of using multiple sort keys, which makes it possible to use | |
83 | the standard, written in C<c> and fast, perl C<sort()> function on the output, | |
84 | and is the basis of the C<GRT> (Guttman Rossler Transform). Some string | |
b1fc79bc | 85 | combinations can slow the C<GRT> down, by just being too plain complex for its |
da096611 RGS |
86 | own good. |
87 | ||
88 | For applications using database backends, the standard C<DBIx> namespace has | |
89 | tries to help with keeping things nippy, not least because it tries to I<not> | |
90 | query the database until the latest possible moment, but always read the docs | |
91 | which come with your choice of libraries. Among the many issues facing | |
92 | developers dealing with databases should remain aware of is to always use | |
93 | C<SQL> placeholders and to consider pre-fetching data sets when this might | |
94 | prove advantageous. Splitting up a large file by assigning multiple processes | |
95 | to parsing a single file, using say C<POE>, C<threads> or C<fork> can also be a | |
96 | useful way of optimizing your usage of the available C<CPU> resources, though | |
97 | this technique is fraught with concurrency issues and demands high attention to | |
98 | detail. | |
99 | ||
100 | Every case has a specific application and one or more exceptions, and there is | |
101 | no replacement for running a few tests and finding out which method works best | |
102 | for your particular environment, this is why writing optimal code is not an | |
103 | exact science, and why we love using Perl so much - TMTOWTDI. | |
104 | ||
105 | =head1 BENCHMARKS | |
106 | ||
107 | Here are a few examples to demonstrate usage of Perl's benchmarking tools. | |
108 | ||
109 | =head2 Assigning and Dereferencing Variables. | |
110 | ||
111 | I'm sure most of us have seen code which looks like, (or worse than), this: | |
112 | ||
f185f654 KW |
113 | if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) { |
114 | ... | |
da096611 RGS |
115 | |
116 | This sort of code can be a real eyesore to read, as well as being very | |
117 | sensitive to typos, and it's much clearer to dereference the variable | |
118 | explicitly. We're side-stepping the issue of working with object-oriented | |
119 | programming techniques to encapsulate variable access via methods, only | |
120 | accessible through an object. Here we're just discussing the technical | |
121 | implementation of choice, and whether this has an effect on performance. We | |
122 | can see whether this dereferencing operation, has any overhead by putting | |
123 | comparative code in a file and running a C<Benchmark> test. | |
124 | ||
125 | # dereference | |
126 | ||
f185f654 | 127 | #!/usr/bin/perl |
da096611 | 128 | |
f185f654 KW |
129 | use strict; |
130 | use warnings; | |
da096611 | 131 | |
f185f654 | 132 | use Benchmark; |
da096611 | 133 | |
f185f654 KW |
134 | my $ref = { |
135 | 'ref' => { | |
136 | _myscore => '100 + 1', | |
137 | _yourscore => '102 - 1', | |
138 | }, | |
139 | }; | |
da096611 | 140 | |
f185f654 KW |
141 | timethese(1000000, { |
142 | 'direct' => sub { | |
143 | my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ; | |
144 | }, | |
145 | 'dereference' => sub { | |
146 | my $ref = $ref->{ref}; | |
147 | my $myscore = $ref->{_myscore}; | |
148 | my $yourscore = $ref->{_yourscore}; | |
149 | my $x = $myscore . $yourscore; | |
150 | }, | |
151 | }); | |
da096611 RGS |
152 | |
153 | It's essential to run any timing measurements a sufficient number of times so | |
154 | the numbers settle on a numerical average, otherwise each run will naturally | |
155 | fluctuate due to variations in the environment, to reduce the effect of | |
156 | contention for C<CPU> resources and network bandwidth for instance. Running | |
157 | the above code for one million iterations, we can take a look at the report | |
158 | output by the C<Benchmark> module, to see which approach is the most effective. | |
159 | ||
f185f654 | 160 | $> perl dereference |
da096611 | 161 | |
f185f654 KW |
162 | Benchmark: timing 1000000 iterations of dereference, direct... |
163 | dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000) | |
164 | direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000) | |
da096611 RGS |
165 | |
166 | The difference is clear to see and the dereferencing approach is slower. While | |
167 | it managed to execute an average of 628,930 times a second during our test, the | |
168 | direct approach managed to run an additional 204,403 times, unfortunately. | |
169 | Unfortunately, because there are many examples of code written using the | |
170 | multiple layer direct variable access, and it's usually horrible. It is, | |
e1020413 | 171 | however, minusculy faster. The question remains whether the minute gain is |
da096611 RGS |
172 | actually worth the eyestrain, or the loss of maintainability. |
173 | ||
174 | =head2 Search and replace or tr | |
175 | ||
176 | If we have a string which needs to be modified, while a regex will almost | |
177 | always be much more flexible, C<tr>, an oft underused tool, can still be a | |
178 | useful. One scenario might be replace all vowels with another character. The | |
179 | regex solution might look like this: | |
180 | ||
f185f654 | 181 | $str =~ s/[aeiou]/x/g |
da096611 RGS |
182 | |
183 | The C<tr> alternative might look like this: | |
184 | ||
f185f654 | 185 | $str =~ tr/aeiou/xxxxx/ |
da096611 RGS |
186 | |
187 | We can put that into a test file which we can run to check which approach is | |
188 | the fastest, using a global C<$STR> variable to assign to the C<my $str> | |
189 | variable so as to avoid perl trying to optimize any of the work away by | |
190 | noticing it's assigned only the once. | |
191 | ||
192 | # regex-transliterate | |
193 | ||
f185f654 | 194 | #!/usr/bin/perl |
da096611 | 195 | |
f185f654 KW |
196 | use strict; |
197 | use warnings; | |
da096611 | 198 | |
f185f654 | 199 | use Benchmark; |
da096611 | 200 | |
f185f654 | 201 | my $STR = "$$-this and that"; |
da096611 | 202 | |
f185f654 KW |
203 | timethese( 1000000, { |
204 | 'sr' => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; }, | |
205 | 'tr' => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; }, | |
206 | }); | |
da096611 RGS |
207 | |
208 | Running the code gives us our results: | |
209 | ||
f185f654 | 210 | $> perl regex-transliterate |
da096611 | 211 | |
f185f654 KW |
212 | Benchmark: timing 1000000 iterations of sr, tr... |
213 | sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000) | |
214 | tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000) | |
da096611 RGS |
215 | |
216 | The C<tr> version is a clear winner. One solution is flexible, the other is | |
e1020413 | 217 | fast - and it's appropriately the programmer's choice which to use. |
da096611 RGS |
218 | |
219 | Check the C<Benchmark> docs for further useful techniques. | |
220 | ||
221 | =head1 PROFILING TOOLS | |
222 | ||
223 | A slightly larger piece of code will provide something on which a profiler can | |
224 | produce more extensive reporting statistics. This example uses the simplistic | |
225 | C<wordmatch> program which parses a given input file and spews out a short | |
226 | report on the contents. | |
227 | ||
228 | # wordmatch | |
229 | ||
f185f654 KW |
230 | #!/usr/bin/perl |
231 | ||
232 | use strict; | |
233 | use warnings; | |
234 | ||
235 | =head1 NAME | |
236 | ||
237 | filewords - word analysis of input file | |
238 | ||
239 | =head1 SYNOPSIS | |
240 | ||
241 | filewords -f inputfilename [-d] | |
da096611 | 242 | |
f185f654 | 243 | =head1 DESCRIPTION |
da096611 | 244 | |
f185f654 KW |
245 | This program parses the given filename, specified with C<-f>, and |
246 | displays a simple analysis of the words found therein. Use the C<-d> | |
247 | switch to enable debugging messages. | |
da096611 | 248 | |
f185f654 | 249 | =cut |
da096611 | 250 | |
f185f654 KW |
251 | use FileHandle; |
252 | use Getopt::Long; | |
da096611 | 253 | |
f185f654 KW |
254 | my $debug = 0; |
255 | my $file = ''; | |
da096611 | 256 | |
f185f654 KW |
257 | my $result = GetOptions ( |
258 | 'debug' => \$debug, | |
259 | 'file=s' => \$file, | |
260 | ); | |
261 | die("invalid args") unless $result; | |
da096611 | 262 | |
f185f654 KW |
263 | unless ( -f $file ) { |
264 | die("Usage: $0 -f filename [-d]"); | |
265 | } | |
266 | my $FH = FileHandle->new("< $file") | |
267 | or die("unable to open file($file): $!"); | |
da096611 | 268 | |
f185f654 KW |
269 | my $i_LINES = 0; |
270 | my $i_WORDS = 0; | |
271 | my %count = (); | |
da096611 | 272 | |
f185f654 KW |
273 | my @lines = <$FH>; |
274 | foreach my $line ( @lines ) { | |
275 | $i_LINES++; | |
276 | $line =~ s/\n//; | |
277 | my @words = split(/ +/, $line); | |
278 | my $i_words = scalar(@words); | |
279 | $i_WORDS = $i_WORDS + $i_words; | |
280 | debug("line: $i_LINES supplying $i_words words: @words"); | |
281 | my $i_word = 0; | |
282 | foreach my $word ( @words ) { | |
283 | $i_word++; | |
e46aa1dd KW |
284 | $count{$i_LINES}{spec} += matches($i_word, $word, |
285 | '[^a-zA-Z0-9]'); | |
286 | $count{$i_LINES}{only} += matches($i_word, $word, | |
287 | '^[^a-zA-Z0-9]+$'); | |
288 | $count{$i_LINES}{cons} += matches($i_word, $word, | |
289 | '^[(?i:bcdfghjklmnpqrstvwxyz)]+$'); | |
290 | $count{$i_LINES}{vows} += matches($i_word, $word, | |
291 | '^[(?i:aeiou)]+$'); | |
292 | $count{$i_LINES}{caps} += matches($i_word, $word, | |
293 | '^[(A-Z)]+$'); | |
f185f654 KW |
294 | } |
295 | } | |
da096611 | 296 | |
f185f654 | 297 | print report( %count ); |
da096611 | 298 | |
f185f654 KW |
299 | sub matches { |
300 | my $i_wd = shift; | |
301 | my $word = shift; | |
302 | my $regex = shift; | |
303 | my $has = 0; | |
da096611 | 304 | |
f185f654 KW |
305 | if ( $word =~ /($regex)/ ) { |
306 | $has++ if $1; | |
307 | } | |
da096611 | 308 | |
e46aa1dd KW |
309 | debug( "word: $i_wd " |
310 | . ($has ? 'matches' : 'does not match') | |
311 | . " chars: /$regex/"); | |
da096611 | 312 | |
f185f654 KW |
313 | return $has; |
314 | } | |
da096611 | 315 | |
f185f654 KW |
316 | sub report { |
317 | my %report = @_; | |
318 | my %rep; | |
da096611 | 319 | |
f185f654 KW |
320 | foreach my $line ( keys %report ) { |
321 | foreach my $key ( keys %{ $report{$line} } ) { | |
322 | $rep{$key} += $report{$line}{$key}; | |
323 | } | |
324 | } | |
da096611 | 325 | |
f185f654 KW |
326 | my $report = qq| |
327 | $0 report for $file: | |
328 | lines in file: $i_LINES | |
329 | words in file: $i_WORDS | |
330 | words with special (non-word) characters: $i_spec | |
331 | words with only special (non-word) characters: $i_only | |
332 | words with only consonants: $i_cons | |
333 | words with only capital letters: $i_caps | |
334 | words with only vowels: $i_vows | |
335 | |; | |
da096611 | 336 | |
f185f654 KW |
337 | return $report; |
338 | } | |
339 | ||
340 | sub debug { | |
341 | my $message = shift; | |
342 | ||
343 | if ( $debug ) { | |
344 | print STDERR "DBG: $message\n"; | |
345 | } | |
346 | } | |
347 | ||
348 | exit 0; | |
da096611 RGS |
349 | |
350 | =head2 Devel::DProf | |
351 | ||
352 | This venerable module has been the de-facto standard for Perl code profiling | |
353 | for more than a decade, but has been replaced by a number of other modules | |
354 | which have brought us back to the 21st century. Although you're recommended to | |
355 | evaluate your tool from the several mentioned here and from the CPAN list at | |
356 | the base of this document, (and currently L<Devel::NYTProf> seems to be the | |
357 | weapon of choice - see below), we'll take a quick look at the output from | |
358 | L<Devel::DProf> first, to set a baseline for Perl profiling tools. Run the | |
359 | above program under the control of C<Devel::DProf> by using the C<-d> switch on | |
360 | the command-line. | |
361 | ||
f185f654 | 362 | $> perl -d:DProf wordmatch -f perl5db.pl |
da096611 | 363 | |
f185f654 | 364 | <...multiple lines snipped...> |
da096611 | 365 | |
f185f654 KW |
366 | wordmatch report for perl5db.pl: |
367 | lines in file: 9428 | |
368 | words in file: 50243 | |
369 | words with special (non-word) characters: 20480 | |
370 | words with only special (non-word) characters: 7790 | |
371 | words with only consonants: 4801 | |
372 | words with only capital letters: 1316 | |
373 | words with only vowels: 1701 | |
da096611 RGS |
374 | |
375 | C<Devel::DProf> produces a special file, called F<tmon.out> by default, and | |
376 | this file is read by the C<dprofpp> program, which is already installed as part | |
377 | of the C<Devel::DProf> distribution. If you call C<dprofpp> with no options, | |
378 | it will read the F<tmon.out> file in the current directory and produce a human | |
379 | readable statistics report of the run of your program. Note that this may take | |
380 | a little time. | |
381 | ||
f185f654 KW |
382 | $> dprofpp |
383 | ||
384 | Total Elapsed Time = 2.951677 Seconds | |
385 | User+System Time = 2.871677 Seconds | |
386 | Exclusive Times | |
387 | %Time ExclSec CumulS #Calls sec/call Csec/c Name | |
388 | 102. 2.945 3.003 251215 0.0000 0.0000 main::matches | |
389 | 2.40 0.069 0.069 260643 0.0000 0.0000 main::debug | |
390 | 1.74 0.050 0.050 1 0.0500 0.0500 main::report | |
391 | 1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN | |
392 | 0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy | |
393 | 0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN | |
394 | 0.00 - -0.000 1 - - Getopt::Long::FindOption | |
395 | 0.00 - -0.000 1 - - Symbol::BEGIN | |
396 | 0.00 - -0.000 1 - - Fcntl::BEGIN | |
397 | 0.00 - -0.000 1 - - Fcntl::bootstrap | |
398 | 0.00 - -0.000 1 - - warnings::BEGIN | |
399 | 0.00 - -0.000 1 - - IO::bootstrap | |
400 | 0.00 - -0.000 1 - - Getopt::Long::ConfigDefaults | |
401 | 0.00 - -0.000 1 - - Getopt::Long::Configure | |
402 | 0.00 - -0.000 1 - - Symbol::gensym | |
da096611 RGS |
403 | |
404 | C<dprofpp> will produce some quite detailed reporting on the activity of the | |
405 | C<wordmatch> program. The wallclock, user and system, times are at the top of | |
406 | the analysis, and after this are the main columns defining which define the | |
407 | report. Check the C<dprofpp> docs for details of the many options it supports. | |
408 | ||
4b05bc8e | 409 | See also C<L<Apache::DProf>> which hooks C<Devel::DProf> into C<mod_perl>. |
da096611 RGS |
410 | |
411 | =head2 Devel::Profiler | |
412 | ||
413 | Let's take a look at the same program using a different profiler: | |
414 | C<Devel::Profiler>, a drop-in Perl-only replacement for C<Devel::DProf>. The | |
415 | usage is very slightly different in that instead of using the special C<-d:> | |
416 | flag, you pull C<Devel::Profiler> in directly as a module using C<-M>. | |
417 | ||
f185f654 | 418 | $> perl -MDevel::Profiler wordmatch -f perl5db.pl |
da096611 | 419 | |
f185f654 | 420 | <...multiple lines snipped...> |
da096611 | 421 | |
f185f654 KW |
422 | wordmatch report for perl5db.pl: |
423 | lines in file: 9428 | |
424 | words in file: 50243 | |
425 | words with special (non-word) characters: 20480 | |
426 | words with only special (non-word) characters: 7790 | |
427 | words with only consonants: 4801 | |
428 | words with only capital letters: 1316 | |
429 | words with only vowels: 1701 | |
da096611 RGS |
430 | |
431 | ||
432 | C<Devel::Profiler> generates a tmon.out file which is compatible with the | |
433 | C<dprofpp> program, thus saving the construction of a dedicated statistics | |
434 | reader program. C<dprofpp> usage is therefore identical to the above example. | |
435 | ||
f185f654 KW |
436 | $> dprofpp |
437 | ||
438 | Total Elapsed Time = 20.984 Seconds | |
439 | User+System Time = 19.981 Seconds | |
440 | Exclusive Times | |
441 | %Time ExclSec CumulS #Calls sec/call Csec/c Name | |
442 | 49.0 9.792 14.509 251215 0.0000 0.0001 main::matches | |
443 | 24.4 4.887 4.887 260643 0.0000 0.0000 main::debug | |
444 | 0.25 0.049 0.049 1 0.0490 0.0490 main::report | |
445 | 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions | |
446 | 0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec | |
447 | 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption | |
448 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new | |
449 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new | |
450 | 0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym | |
451 | 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open | |
da096611 RGS |
452 | |
453 | Interestingly we get slightly different results, which is mostly because the | |
454 | algorithm which generates the report is different, even though the output file | |
455 | format was allegedly identical. The elapsed, user and system times are clearly | |
e1020413 | 456 | showing the time it took for C<Devel::Profiler> to execute its own run, but |
da096611 RGS |
457 | the column listings feel more accurate somehow than the ones we had earlier |
458 | from C<Devel::DProf>. The 102% figure has disappeared, for example. This is | |
459 | where we have to use the tools at our disposal, and recognise their pros and | |
460 | cons, before using them. Interestingly, the numbers of calls for each | |
461 | subroutine are identical in the two reports, it's the percentages which differ. | |
462 | As the author of C<Devel::Proviler> writes: | |
463 | ||
f185f654 KW |
464 | ...running HTML::Template's test suite under Devel::DProf shows |
465 | output() taking NO time but Devel::Profiler shows around 10% of the | |
466 | time is in output(). I don't know which to trust but my gut tells me | |
467 | something is wrong with Devel::DProf. HTML::Template::output() is a | |
468 | big routine that's called for every test. Either way, something needs | |
469 | fixing. | |
da096611 RGS |
470 | |
471 | YMMV. | |
472 | ||
4b05bc8e KW |
473 | See also C<L<Devel::Apache::Profiler>> which hooks C<Devel::Profiler> |
474 | into C<mod_perl>. | |
da096611 RGS |
475 | |
476 | =head2 Devel::SmallProf | |
477 | ||
478 | The C<Devel::SmallProf> profiler examines the runtime of your Perl program and | |
479 | produces a line-by-line listing to show how many times each line was called, | |
480 | and how long each line took to execute. It is called by supplying the familiar | |
481 | C<-d> flag to Perl at runtime. | |
482 | ||
f185f654 | 483 | $> perl -d:SmallProf wordmatch -f perl5db.pl |
da096611 | 484 | |
f185f654 | 485 | <...multiple lines snipped...> |
da096611 | 486 | |
f185f654 KW |
487 | wordmatch report for perl5db.pl: |
488 | lines in file: 9428 | |
489 | words in file: 50243 | |
490 | words with special (non-word) characters: 20480 | |
491 | words with only special (non-word) characters: 7790 | |
492 | words with only consonants: 4801 | |
493 | words with only capital letters: 1316 | |
494 | words with only vowels: 1701 | |
da096611 RGS |
495 | |
496 | C<Devel::SmallProf> writes it's output into a file called F<smallprof.out>, by | |
497 | default. The format of the file looks like this: | |
498 | ||
f185f654 | 499 | <num> <time> <ctime> <line>:<text> |
da096611 RGS |
500 | |
501 | When the program has terminated, the output may be examined and sorted using | |
502 | any standard text filtering utilities. Something like the following may be | |
503 | sufficient: | |
504 | ||
f185f654 KW |
505 | $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20 |
506 | ||
507 | 251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) { | |
508 | 251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? 'matches' : | |
509 | 251215 0.02693 4.10000 81: return $has; | |
510 | 260643 0.02841 4.07000 128: if ( $debug ) { | |
511 | 260643 0.02601 4.04000 126: my $message = shift; | |
512 | 251215 0.02641 3.91000 73: my $has = 0; | |
513 | 251215 0.03311 3.71000 70: my $i_wd = shift; | |
514 | 251215 0.02699 3.69000 72: my $regex = shift; | |
515 | 251215 0.02766 3.68000 71: my $word = shift; | |
516 | 50243 0.59726 1.00000 59: $count{$i_LINES}{cons} = | |
517 | 50243 0.48175 0.92000 61: $count{$i_LINES}{spec} = | |
518 | 50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word, | |
519 | 50243 0.48837 0.88000 63: $count{$i_LINES}{caps} = | |
520 | 50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, '^[(A- | |
521 | 50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, '[^a- | |
522 | 50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word, | |
523 | 50243 0.00688 0.80000 53: $i_word++; | |
524 | 50243 0.48469 0.79000 62: $count{$i_LINES}{only} = | |
525 | 50243 0.48928 0.77000 60: $count{$i_LINES}{vows} = | |
526 | 50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, '^[^a- | |
da096611 RGS |
527 | |
528 | You can immediately see a slightly different focus to the subroutine profiling | |
529 | modules, and we start to see exactly which line of code is taking the most | |
530 | time. That regex line is looking a bit suspicious, for example. Remember that | |
531 | these tools are supposed to be used together, there is no single best way to | |
532 | profile your code, you need to use the best tools for the job. | |
533 | ||
4b05bc8e KW |
534 | See also C<L<Apache::SmallProf>> which hooks C<Devel::SmallProf> into |
535 | C<mod_perl>. | |
da096611 RGS |
536 | |
537 | =head2 Devel::FastProf | |
538 | ||
539 | C<Devel::FastProf> is another Perl line profiler. This was written with a view | |
540 | to getting a faster line profiler, than is possible with for example | |
541 | C<Devel::SmallProf>, because it's written in C<C>. To use C<Devel::FastProf>, | |
542 | supply the C<-d> argument to Perl: | |
543 | ||
f185f654 | 544 | $> perl -d:FastProf wordmatch -f perl5db.pl |
da096611 | 545 | |
f185f654 | 546 | <...multiple lines snipped...> |
da096611 | 547 | |
f185f654 KW |
548 | wordmatch report for perl5db.pl: |
549 | lines in file: 9428 | |
550 | words in file: 50243 | |
551 | words with special (non-word) characters: 20480 | |
552 | words with only special (non-word) characters: 7790 | |
553 | words with only consonants: 4801 | |
554 | words with only capital letters: 1316 | |
555 | words with only vowels: 1701 | |
da096611 RGS |
556 | |
557 | C<Devel::FastProf> writes statistics to the file F<fastprof.out> in the current | |
558 | directory. The output file, which can be specified, can be interpreted by using | |
559 | the C<fprofpp> command-line program. | |
560 | ||
f185f654 | 561 | $> fprofpp | head -n20 |
da096611 | 562 | |
f185f654 KW |
563 | # fprofpp output format is: |
564 | # filename:line time count: source | |
565 | wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) { | |
566 | wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/"); | |
567 | wordmatch:81 1.47604 251215: return $has; | |
568 | wordmatch:126 1.43441 260643: my $message = shift; | |
569 | wordmatch:128 1.42156 260643: if ( $debug ) { | |
570 | wordmatch:70 1.36824 251215: my $i_wd = shift; | |
571 | wordmatch:71 1.36739 251215: my $word = shift; | |
572 | wordmatch:72 1.35939 251215: my $regex = shift; | |
da096611 RGS |
573 | |
574 | Straightaway we can see that the number of times each line has been called is | |
575 | identical to the C<Devel::SmallProf> output, and the sequence is only very | |
576 | slightly different based on the ordering of the amount of time each line took | |
577 | to execute, C<if ( $debug ) { > and C<my $message = shift;>, for example. The | |
578 | differences in the actual times recorded might be in the algorithm used | |
579 | internally, or it could be due to system resource limitations or contention. | |
580 | ||
96090e4f | 581 | See also the L<DBIx::Profile> which will profile database queries running |
da096611 RGS |
582 | under the C<DBIx::*> namespace. |
583 | ||
584 | =head2 Devel::NYTProf | |
585 | ||
586 | C<Devel::NYTProf> is the B<next generation> of Perl code profiler, fixing many | |
587 | shortcomings in other tools and implementing many cool features. First of all it | |
588 | can be used as either a I<line> profiler, a I<block> or a I<subroutine> | |
589 | profiler, all at once. It can also use sub-microsecond (100ns) resolution on | |
590 | systems which provide C<clock_gettime()>. It can be started and stopped even | |
591 | by the program being profiled. It's a one-line entry to profile C<mod_perl> | |
592 | applications. It's written in C<c> and is probably the fastest profiler | |
593 | available for Perl. The list of coolness just goes on. Enough of that, let's | |
594 | see how to it works - just use the familiar C<-d> switch to plug it in and run | |
595 | the code. | |
596 | ||
f185f654 | 597 | $> perl -d:NYTProf wordmatch -f perl5db.pl |
da096611 | 598 | |
f185f654 KW |
599 | wordmatch report for perl5db.pl: |
600 | lines in file: 9427 | |
601 | words in file: 50243 | |
602 | words with special (non-word) characters: 20480 | |
603 | words with only special (non-word) characters: 7790 | |
604 | words with only consonants: 4801 | |
605 | words with only capital letters: 1316 | |
606 | words with only vowels: 1701 | |
da096611 RGS |
607 | |
608 | C<NYTProf> will generate a report database into the file F<nytprof.out> by | |
609 | default. Human readable reports can be generated from here by using the | |
610 | supplied C<nytprofhtml> (HTML output) and C<nytprofcsv> (CSV output) programs. | |
1cecf2c0 | 611 | We've used the Unix system C<html2text> utility to convert the |
da096611 RGS |
612 | F<nytprof/index.html> file for convenience here. |
613 | ||
f185f654 KW |
614 | $> html2text nytprof/index.html |
615 | ||
616 | Performance Profile Index | |
617 | For wordmatch | |
618 | Run on Fri Sep 26 13:46:39 2008 | |
619 | Reported on Fri Sep 26 13:47:23 2008 | |
620 | ||
621 | Top 15 Subroutines -- ordered by exclusive time | |
622 | |Calls |P |F |Inclusive|Exclusive|Subroutine | | |
623 | | | | |Time |Time | | | |
624 | |251215|5 |1 |13.09263 |10.47692 |main:: |matches | | |
625 | |260642|2 |1 |2.71199 |2.71199 |main:: |debug | | |
626 | |1 |1 |1 |0.21404 |0.21404 |main:: |report | | |
627 | |2 |2 |2 |0.00511 |0.00511 |XSLoader:: |load (xsub) | | |
628 | |14 |14|7 |0.00304 |0.00298 |Exporter:: |import | | |
629 | |3 |1 |1 |0.00265 |0.00254 |Exporter:: |as_heavy | | |
630 | |10 |10|4 |0.00140 |0.00140 |vars:: |import | | |
631 | |13 |13|1 |0.00129 |0.00109 |constant:: |import | | |
632 | |1 |1 |1 |0.00360 |0.00096 |FileHandle:: |import | | |
633 | |3 |3 |3 |0.00086 |0.00074 |warnings::register::|import | | |
634 | |9 |3 |1 |0.00036 |0.00036 |strict:: |bits | | |
635 | |13 |13|13|0.00032 |0.00029 |strict:: |import | | |
636 | |2 |2 |2 |0.00020 |0.00020 |warnings:: |import | | |
637 | |2 |1 |1 |0.00020 |0.00020 |Getopt::Long:: |ParseOptionSpec| | |
638 | |7 |7 |6 |0.00043 |0.00020 |strict:: |unimport | | |
639 | ||
640 | For more information see the full list of 189 subroutines. | |
da096611 RGS |
641 | |
642 | The first part of the report already shows the critical information regarding | |
643 | which subroutines are using the most time. The next gives some statistics | |
644 | about the source files profiled. | |
645 | ||
f185f654 KW |
646 | Source Code Files -- ordered by exclusive time then name |
647 | |Stmts |Exclusive|Avg. |Reports |Source File | | |
648 | | |Time | | | | | |
649 | |2699761|15.66654 |6e-06 |line . block . sub|wordmatch | | |
650 | |35 |0.02187 |0.00062|line . block . sub|IO/Handle.pm | | |
651 | |274 |0.01525 |0.00006|line . block . sub|Getopt/Long.pm | | |
652 | |20 |0.00585 |0.00029|line . block . sub|Fcntl.pm | | |
653 | |128 |0.00340 |0.00003|line . block . sub|Exporter/Heavy.pm | | |
654 | |42 |0.00332 |0.00008|line . block . sub|IO/File.pm | | |
655 | |261 |0.00308 |0.00001|line . block . sub|Exporter.pm | | |
656 | |323 |0.00248 |8e-06 |line . block . sub|constant.pm | | |
657 | |12 |0.00246 |0.00021|line . block . sub|File/Spec/Unix.pm | | |
658 | |191 |0.00240 |0.00001|line . block . sub|vars.pm | | |
659 | |77 |0.00201 |0.00003|line . block . sub|FileHandle.pm | | |
660 | |12 |0.00198 |0.00016|line . block . sub|Carp.pm | | |
661 | |14 |0.00175 |0.00013|line . block . sub|Symbol.pm | | |
662 | |15 |0.00130 |0.00009|line . block . sub|IO.pm | | |
663 | |22 |0.00120 |0.00005|line . block . sub|IO/Seekable.pm | | |
664 | |198 |0.00085 |4e-06 |line . block . sub|warnings/register.pm| | |
665 | |114 |0.00080 |7e-06 |line . block . sub|strict.pm | | |
666 | |47 |0.00068 |0.00001|line . block . sub|warnings.pm | | |
667 | |27 |0.00054 |0.00002|line . block . sub|overload.pm | | |
668 | |9 |0.00047 |0.00005|line . block . sub|SelectSaver.pm | | |
669 | |13 |0.00045 |0.00003|line . block . sub|File/Spec.pm | | |
670 | |2701595|15.73869 | |Total | | |
671 | |128647 |0.74946 | |Average | | |
672 | | |0.00201 |0.00003|Median | | |
673 | | |0.00121 |0.00003|Deviation | | |
674 | ||
675 | Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and | |
676 | Adam Kaplan. | |
da096611 RGS |
677 | |
678 | At this point, if you're using the I<html> report, you can click through the | |
679 | various links to bore down into each subroutine and each line of code. Because | |
680 | we're using the text reporting here, and there's a whole directory full of | |
681 | reports built for each source file, we'll just display a part of the | |
682 | corresponding F<wordmatch-line.html> file, sufficient to give an idea of the | |
683 | sort of output you can expect from this cool tool. | |
684 | ||
f185f654 KW |
685 | $> html2text nytprof/wordmatch-line.html |
686 | ||
687 | Performance Profile -- -block view-.-line view-.-sub view- | |
688 | For wordmatch | |
689 | Run on Fri Sep 26 13:46:39 2008 | |
690 | Reported on Fri Sep 26 13:47:22 2008 | |
691 | ||
692 | File wordmatch | |
693 | ||
694 | Subroutines -- ordered by exclusive time | |
695 | |Calls |P|F|Inclusive|Exclusive|Subroutine | | |
696 | | | | |Time |Time | | | |
697 | |251215|5|1|13.09263 |10.47692 |main::|matches| | |
698 | |260642|2|1|2.71199 |2.71199 |main::|debug | | |
699 | |1 |1|1|0.21404 |0.21404 |main::|report | | |
700 | |0 |0|0|0 |0 |main::|BEGIN | | |
701 | ||
702 | ||
703 | |Line|Stmts.|Exclusive|Avg. |Code | | |
704 | | | |Time | | | | |
705 | |1 | | | |#!/usr/bin/perl | | |
706 | |2 | | | | | | |
707 | | | | | |use strict; | | |
708 | |3 |3 |0.00086 |0.00029|# spent 0.00003s making 1 calls to strict:: | | |
709 | | | | | |import | | |
710 | | | | | |use warnings; | | |
711 | |4 |3 |0.01563 |0.00521|# spent 0.00012s making 1 calls to warnings:: | | |
712 | | | | | |import | | |
713 | |5 | | | | | | |
714 | |6 | | | |=head1 NAME | | |
715 | |7 | | | | | | |
716 | |8 | | | |filewords - word analysis of input file | | |
717 | <...snip...> | |
718 | |62 |1 |0.00445 |0.00445|print report( %count ); | | |
719 | | | | | |# spent 0.21404s making 1 calls to main::report| | |
720 | |63 | | | | | | |
721 | | | | | |# spent 23.56955s (10.47692+2.61571) within | | |
722 | | | | | |main::matches which was called 251215 times, | | |
723 | | | | | |avg 0.00005s/call: # 50243 times | | |
724 | | | | | |(2.12134+0.51939s) at line 57 of wordmatch, avg| | |
725 | | | | | |0.00005s/call # 50243 times (2.17735+0.54550s) | | |
726 | |64 | | | |at line 56 of wordmatch, avg 0.00005s/call # | | |
727 | | | | | |50243 times (2.10992+0.51797s) at line 58 of | | |
728 | | | | | |wordmatch, avg 0.00005s/call # 50243 times | | |
729 | | | | | |(2.12696+0.51598s) at line 55 of wordmatch, avg| | |
730 | | | | | |0.00005s/call # 50243 times (1.94134+0.51687s) | | |
731 | | | | | |at line 54 of wordmatch, avg 0.00005s/call | | |
732 | | | | | |sub matches { | | |
733 | <...snip...> | |
734 | |102 | | | | | | |
735 | | | | | |# spent 2.71199s within main::debug which was | | |
736 | | | | | |called 260642 times, avg 0.00001s/call: # | | |
737 | | | | | |251215 times (2.61571+0s) by main::matches at | | |
738 | |103 | | | |line 74 of wordmatch, avg 0.00001s/call # 9427 | | |
739 | | | | | |times (0.09628+0s) at line 50 of wordmatch, avg| | |
740 | | | | | |0.00001s/call | | |
741 | | | | | |sub debug { | | |
742 | |104 |260642|0.58496 |2e-06 |my $message = shift; | | |
743 | |105 | | | | | | |
744 | |106 |260642|1.09917 |4e-06 |if ( $debug ) { | | |
745 | |107 | | | |print STDERR "DBG: $message\n"; | | |
746 | |108 | | | |} | | |
747 | |109 | | | |} | | |
748 | |110 | | | | | | |
749 | |111 |1 |0.01501 |0.01501|exit 0; | | |
750 | |112 | | | | | | |
da096611 RGS |
751 | |
752 | Oodles of very useful information in there - this seems to be the way forward. | |
753 | ||
4b05bc8e KW |
754 | See also C<L<Devel::NYTProf::Apache>> which hooks C<Devel::NYTProf> into |
755 | C<mod_perl>. | |
da096611 RGS |
756 | |
757 | =head1 SORTING | |
758 | ||
759 | Perl modules are not the only tools a performance analyst has at their | |
760 | disposal, system tools like C<time> should not be overlooked as the next | |
761 | example shows, where we take a quick look at sorting. Many books, theses and | |
762 | articles, have been written about efficient sorting algorithms, and this is not | |
763 | the place to repeat such work, there's several good sorting modules which | |
764 | deserve taking a look at too: C<Sort::Maker>, C<Sort::Key> spring to mind. | |
765 | However, it's still possible to make some observations on certain Perl specific | |
766 | interpretations on issues relating to sorting data sets and give an example or | |
767 | two with regard to how sorting large data volumes can effect performance. | |
768 | Firstly, an often overlooked point when sorting large amounts of data, one can | |
769 | attempt to reduce the data set to be dealt with and in many cases C<grep()> can | |
770 | be quite useful as a simple filter: | |
771 | ||
f185f654 | 772 | @data = sort grep { /$filter/ } @incoming |
da096611 RGS |
773 | |
774 | A command such as this can vastly reduce the volume of material to actually | |
775 | sort through in the first place, and should not be too lightly disregarded | |
e1020413 | 776 | purely on the basis of its simplicity. The C<KISS> principle is too often |
da096611 RGS |
777 | overlooked - the next example uses the simple system C<time> utility to |
778 | demonstrate. Let's take a look at an actual example of sorting the contents of | |
779 | a large file, an apache logfile would do. This one has over a quarter of a | |
780 | million lines, is 50M in size, and a snippet of it looks like this: | |
781 | ||
782 | # logfile | |
783 | ||
f185f654 KW |
784 | 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" |
785 | 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" | |
786 | 151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" | |
787 | 151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" | |
788 | 151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" | |
789 | 217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" | |
790 | 217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" | |
791 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" | |
792 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" | |
793 | debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" | |
794 | 195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" | |
795 | 195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" | |
796 | 195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" | |
797 | crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0" | |
798 | crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0" | |
799 | fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" | |
800 | fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" | |
801 | 80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" | |
802 | 80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" | |
803 | pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net" | |
804 | livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" | |
805 | livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" | |
806 | dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net" | |
807 | 196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1" | |
da096611 RGS |
808 | |
809 | The specific task here is to sort the 286,525 lines of this file by Response | |
810 | Code, Query, Browser, Referring Url, and lastly Date. One solution might be to | |
811 | use the following code, which iterates over the files given on the | |
812 | command-line. | |
813 | ||
814 | # sort-apache-log | |
815 | ||
f185f654 KW |
816 | #!/usr/bin/perl -n |
817 | ||
818 | use strict; | |
819 | use warnings; | |
820 | ||
821 | my @data; | |
822 | ||
823 | LINE: | |
824 | while ( <> ) { | |
825 | my $line = $_; | |
826 | if ( | |
827 | $line =~ m/^( | |
828 | ([\w\.\-]+) # client | |
829 | \s*-\s*-\s*\[ | |
830 | ([^]]+) # date | |
831 | \]\s*"\w+\s* | |
832 | (\S+) # query | |
833 | [^"]+"\s* | |
834 | (\d+) # status | |
835 | \s+\S+\s+"[^"]*"\s+" | |
836 | ([^"]*) # browser | |
837 | " | |
838 | .* | |
839 | )$/x | |
840 | ) { | |
841 | my @chunks = split(/ +/, $line); | |
842 | my $ip = $1; | |
843 | my $date = $2; | |
844 | my $query = $3; | |
845 | my $status = $4; | |
846 | my $browser = $5; | |
847 | ||
848 | push(@data, [$ip, $date, $query, $status, $browser, $line]); | |
849 | } | |
850 | } | |
851 | ||
852 | my @sorted = sort { | |
853 | $a->[3] cmp $b->[3] | |
854 | || | |
855 | $a->[2] cmp $b->[2] | |
856 | || | |
857 | $a->[0] cmp $b->[0] | |
858 | || | |
859 | $a->[1] cmp $b->[1] | |
860 | || | |
861 | $a->[4] cmp $b->[4] | |
862 | } @data; | |
863 | ||
864 | foreach my $data ( @sorted ) { | |
865 | print $data->[5]; | |
866 | } | |
867 | ||
868 | exit 0; | |
da096611 RGS |
869 | |
870 | When running this program, redirect C<STDOUT> so it is possible to check the | |
871 | output is correct from following test runs and use the system C<time> utility | |
872 | to check the overall runtime. | |
873 | ||
f185f654 | 874 | $> time ./sort-apache-log logfile > out-sort |
da096611 | 875 | |
f185f654 KW |
876 | real 0m17.371s |
877 | user 0m15.757s | |
878 | sys 0m0.592s | |
da096611 RGS |
879 | |
880 | The program took just over 17 wallclock seconds to run. Note the different | |
881 | values C<time> outputs, it's important to always use the same one, and to not | |
882 | confuse what each one means. | |
883 | ||
884 | =over 4 | |
885 | ||
886 | =item Elapsed Real Time | |
887 | ||
888 | The overall, or wallclock, time between when C<time> was called, and when it | |
889 | terminates. The elapsed time includes both user and system times, and time | |
890 | spent waiting for other users and processes on the system. Inevitably, this is | |
891 | the most approximate of the measurements given. | |
892 | ||
893 | =item User CPU Time | |
894 | ||
895 | The user time is the amount of time the entire process spent on behalf of the | |
896 | user on this system executing this program. | |
897 | ||
898 | =item System CPU Time | |
899 | ||
900 | The system time is the amount of time the kernel itself spent executing | |
901 | routines, or system calls, on behalf of this process user. | |
902 | ||
903 | =back | |
904 | ||
905 | Running this same process as a C<Schwarzian Transform> it is possible to | |
906 | eliminate the input and output arrays for storing all the data, and work on the | |
907 | input directly as it arrives too. Otherwise, the code looks fairly similar: | |
908 | ||
909 | # sort-apache-log-schwarzian | |
910 | ||
f185f654 KW |
911 | #!/usr/bin/perl -n |
912 | ||
913 | use strict; | |
914 | use warnings; | |
915 | ||
916 | ||
917 | ||
918 | map $_->[0] => | |
919 | ||
920 | sort { | |
921 | $a->[4] cmp $b->[4] | |
922 | || | |
923 | $a->[3] cmp $b->[3] | |
924 | || | |
925 | $a->[1] cmp $b->[1] | |
926 | || | |
927 | $a->[2] cmp $b->[2] | |
928 | || | |
929 | $a->[5] cmp $b->[5] | |
930 | } | |
931 | map [ $_, m/^( | |
932 | ([\w\.\-]+) # client | |
933 | \s*-\s*-\s*\[ | |
934 | ([^]]+) # date | |
935 | \]\s*"\w+\s* | |
936 | (\S+) # query | |
937 | [^"]+"\s* | |
938 | (\d+) # status | |
939 | \s+\S+\s+"[^"]*"\s+" | |
940 | ([^"]*) # browser | |
941 | " | |
942 | .* | |
943 | )$/xo ] | |
944 | ||
945 | => <>; | |
946 | ||
947 | exit 0; | |
da096611 RGS |
948 | |
949 | Run the new code against the same logfile, as above, to check the new time. | |
950 | ||
f185f654 | 951 | $> time ./sort-apache-log-schwarzian logfile > out-schwarz |
da096611 | 952 | |
f185f654 KW |
953 | real 0m9.664s |
954 | user 0m8.873s | |
955 | sys 0m0.704s | |
da096611 RGS |
956 | |
957 | The time has been cut in half, which is a respectable speed improvement by any | |
958 | standard. Naturally, it is important to check the output is consistent with | |
e1020413 | 959 | the first program run, this is where the Unix system C<cksum> utility comes in. |
da096611 | 960 | |
f185f654 KW |
961 | $> cksum out-sort out-schwarz |
962 | 3044173777 52029194 out-sort | |
963 | 3044173777 52029194 out-schwarz | |
da096611 RGS |
964 | |
965 | BTW. Beware too of pressure from managers who see you speed a program up by 50% | |
966 | of the runtime once, only to get a request one month later to do the same again | |
b1fc79bc | 967 | (true story) - you'll just have to point out you're only human, even if you are a |
da096611 RGS |
968 | Perl programmer, and you'll see what you can do... |
969 | ||
970 | =head1 LOGGING | |
971 | ||
972 | An essential part of any good development process is appropriate error handling | |
973 | with appropriately informative messages, however there exists a school of | |
974 | thought which suggests that log files should be I<chatty>, as if the chain of | |
975 | unbroken output somehow ensures the survival of the program. If speed is in | |
976 | any way an issue, this approach is wrong. | |
977 | ||
978 | A common sight is code which looks something like this: | |
979 | ||
e46aa1dd KW |
980 | logger->debug( "A logging message via process-id: $$ INC: " |
981 | . Dumper(\%INC) ) | |
da096611 RGS |
982 | |
983 | The problem is that this code will always be parsed and executed, even when the | |
984 | debug level set in the logging configuration file is zero. Once the debug() | |
985 | subroutine has been entered, and the internal C<$debug> variable confirmed to | |
986 | be zero, for example, the message which has been sent in will be discarded and | |
b1fc79bc | 987 | the program will continue. In the example given though, the C<\%INC> hash will |
da096611 RGS |
988 | already have been dumped, and the message string constructed, all of which work |
989 | could be bypassed by a debug variable at the statement level, like this: | |
990 | ||
e46aa1dd KW |
991 | logger->debug( "A logging message via process-id: $$ INC: " |
992 | . Dumper(\%INC) ) if $DEBUG; | |
da096611 RGS |
993 | |
994 | This effect can be demonstrated by setting up a test script with both forms, | |
995 | including a C<debug()> subroutine to emulate typical C<logger()> functionality. | |
996 | ||
997 | # ifdebug | |
998 | ||
f185f654 | 999 | #!/usr/bin/perl |
da096611 | 1000 | |
f185f654 KW |
1001 | use strict; |
1002 | use warnings; | |
da096611 | 1003 | |
f185f654 KW |
1004 | use Benchmark; |
1005 | use Data::Dumper; | |
1006 | my $DEBUG = 0; | |
da096611 | 1007 | |
f185f654 KW |
1008 | sub debug { |
1009 | my $msg = shift; | |
da096611 | 1010 | |
f185f654 KW |
1011 | if ( $DEBUG ) { |
1012 | print "DEBUG: $msg\n"; | |
1013 | } | |
1014 | }; | |
da096611 | 1015 | |
f185f654 KW |
1016 | timethese(100000, { |
1017 | 'debug' => sub { | |
1018 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) | |
1019 | }, | |
1020 | 'ifdebug' => sub { | |
1021 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG | |
1022 | }, | |
1023 | }); | |
da096611 RGS |
1024 | |
1025 | Let's see what C<Benchmark> makes of this: | |
1026 | ||
f185f654 KW |
1027 | $> perl ifdebug |
1028 | Benchmark: timing 100000 iterations of constant, sub... | |
1029 | ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000) | |
1030 | (warning: too few iterations for a reliable count) | |
1031 | debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000) | |
da096611 RGS |
1032 | |
1033 | In the one case the code, which does exactly the same thing as far as | |
1034 | outputting any debugging information is concerned, in other words nothing, | |
1035 | takes 14 seconds, and in the other case the code takes one hundredth of a | |
1036 | second. Looks fairly definitive. Use a C<$DEBUG> variable BEFORE you call the | |
1037 | subroutine, rather than relying on the smart functionality inside it. | |
1038 | ||
1039 | =head2 Logging if DEBUG (constant) | |
1040 | ||
1041 | It's possible to take the previous idea a little further, by using a compile | |
1042 | time C<DEBUG> constant. | |
1043 | ||
1044 | # ifdebug-constant | |
1045 | ||
f185f654 KW |
1046 | #!/usr/bin/perl |
1047 | ||
1048 | use strict; | |
1049 | use warnings; | |
1050 | ||
1051 | use Benchmark; | |
1052 | use Data::Dumper; | |
1053 | use constant | |
1054 | DEBUG => 0 | |
1055 | ; | |
1056 | ||
1057 | sub debug { | |
1058 | if ( DEBUG ) { | |
1059 | my $msg = shift; | |
1060 | print "DEBUG: $msg\n"; | |
1061 | } | |
1062 | }; | |
1063 | ||
1064 | timethese(100000, { | |
1065 | 'debug' => sub { | |
1066 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) | |
1067 | }, | |
1068 | 'constant' => sub { | |
1069 | debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG | |
1070 | }, | |
1071 | }); | |
da096611 RGS |
1072 | |
1073 | Running this program produces the following output: | |
1074 | ||
f185f654 KW |
1075 | $> perl ifdebug-constant |
1076 | Benchmark: timing 100000 iterations of constant, sub... | |
1077 | constant: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000) | |
1078 | (warning: too few iterations for a reliable count) | |
1079 | sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000) | |
da096611 RGS |
1080 | |
1081 | The C<DEBUG> constant wipes the floor with even the C<$debug> variable, | |
1082 | clocking in at minus zero seconds, and generates a "warning: too few iterations | |
1083 | for a reliable count" message into the bargain. To see what is really going | |
1084 | on, and why we had too few iterations when we thought we asked for 100000, we | |
1085 | can use the very useful C<B::Deparse> to inspect the new code: | |
1086 | ||
f185f654 KW |
1087 | $> perl -MO=Deparse ifdebug-constant |
1088 | ||
1089 | use Benchmark; | |
1090 | use Data::Dumper; | |
1091 | use constant ('DEBUG', 0); | |
1092 | sub debug { | |
1093 | use warnings; | |
1094 | use strict 'refs'; | |
1095 | 0; | |
1096 | } | |
1097 | use warnings; | |
1098 | use strict 'refs'; | |
1099 | timethese(100000, {'sub', sub { | |
1100 | debug "A $0 logging message via process-id: $$" . Dumper(\%INC); | |
1101 | } | |
1102 | , 'constant', sub { | |
1103 | 0; | |
1104 | } | |
1105 | }); | |
1106 | ifdebug-constant syntax OK | |
da096611 RGS |
1107 | |
1108 | The output shows the constant() subroutine we're testing being replaced with | |
1109 | the value of the C<DEBUG> constant: zero. The line to be tested has been | |
1110 | completely optimized away, and you can't get much more efficient than that. | |
1111 | ||
1112 | =head1 POSTSCRIPT | |
1113 | ||
1114 | This document has provided several way to go about identifying hot-spots, and | |
1115 | checking whether any modifications have improved the runtime of the code. | |
1116 | ||
1117 | As a final thought, remember that it's not (at the time of writing) possible to | |
1118 | produce a useful program which will run in zero or negative time and this basic | |
1119 | principle can be written as: I<useful programs are slow> by their very | |
1120 | definition. It is of course possible to write a nearly instantaneous program, | |
1121 | but it's not going to do very much, here's a very efficient one: | |
1122 | ||
f185f654 | 1123 | $> perl -e 0 |
da096611 RGS |
1124 | |
1125 | Optimizing that any further is a job for C<p5p>. | |
1126 | ||
1127 | =head1 SEE ALSO | |
1128 | ||
1129 | Further reading can be found using the modules and links below. | |
1130 | ||
1131 | =head2 PERLDOCS | |
1132 | ||
ea8b8ad2 | 1133 | For example: C<perldoc -f sort>. |
da096611 | 1134 | |
ea8b8ad2 VP |
1135 | L<perlfaq4>. |
1136 | ||
1137 | L<perlfork>, L<perlfunc>, L<perlretut>, L<perlthrtut>. | |
1138 | ||
1139 | L<threads>. | |
da096611 RGS |
1140 | |
1141 | =head2 MAN PAGES | |
1142 | ||
ea8b8ad2 | 1143 | C<time>. |
da096611 RGS |
1144 | |
1145 | =head2 MODULES | |
1146 | ||
1147 | It's not possible to individually showcase all the performance related code for | |
1148 | Perl here, naturally, but here's a short list of modules from the CPAN which | |
1149 | deserve further attention. | |
1150 | ||
f185f654 KW |
1151 | Apache::DProf |
1152 | Apache::SmallProf | |
1153 | Benchmark | |
1154 | DBIx::Profile | |
1155 | Devel::AutoProfiler | |
1156 | Devel::DProf | |
1157 | Devel::DProfLB | |
1158 | Devel::FastProf | |
1159 | Devel::GraphVizProf | |
1160 | Devel::NYTProf | |
1161 | Devel::NYTProf::Apache | |
1162 | Devel::Profiler | |
1163 | Devel::Profile | |
1164 | Devel::Profit | |
1165 | Devel::SmallProf | |
1166 | Devel::WxProf | |
1167 | POE::Devel::Profiler | |
1168 | Sort::Key | |
1169 | Sort::Maker | |
da096611 RGS |
1170 | |
1171 | =head2 URLS | |
1172 | ||
1173 | Very useful online reference material: | |
1174 | ||
f185f654 | 1175 | http://www.ccl4.org/~nick/P/Fast_Enough/ |
da096611 | 1176 | |
f185f654 | 1177 | http://www-128.ibm.com/developerworks/library/l-optperl.html |
da096611 | 1178 | |
f185f654 | 1179 | http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html |
da096611 | 1180 | |
f185f654 | 1181 | http://en.wikipedia.org/wiki/Performance_analysis |
da096611 | 1182 | |
f185f654 | 1183 | http://apache.perl.org/docs/1.0/guide/performance.html |
da096611 | 1184 | |
f185f654 | 1185 | http://perlgolf.sourceforge.net/ |
da096611 | 1186 | |
f185f654 | 1187 | http://www.sysarch.com/Perl/sort_paper.html |
da096611 | 1188 | |
da096611 RGS |
1189 | =head1 AUTHOR |
1190 | ||
1191 | Richard Foley <richard.foley@rfi.net> Copyright (c) 2008 | |
1192 | ||
1193 | =cut |