perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlperf - Perl Performance and Optimization Techniques
	4
	5	=head1 DESCRIPTION
	6
	7	This is an introduction to the use of performance and optimization techniques
	8	which can be used with particular reference to perl programs. While many perl
	9	developers have come from other languages, and can use their prior knowledge
	10	where appropriate, there are many other people who might benefit from a few
	11	perl specific pointers. If you want the condensed version, perhaps the best
	12	advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said:
	13
	14	"Do Not Engage in Useless Activity"
	15
	16	in 1645.
	17
	18	=head1 OVERVIEW
	19
	20	Perhaps the most common mistake programmers make is to attempt to optimize
	21	their code before a program actually does anything useful - this is a bad idea.
	22	There's no point in having an extremely fast program that doesn't work. The
	23	first job is to get a program to I<correctly> do something B<useful>, (not to
	24	mention ensuring the test suite is fully functional), and only then to consider
	25	optimizing it. Having decided to optimize existing working code, there are
	26	several simple but essential steps to consider which are intrinsic to any
	27	optimization process.
	28
	29	=head2 ONE STEP SIDEWAYS
	30
	31	Firstly, you need to establish a baseline time for the existing code, which
	32	timing needs to be reliable and repeatable. You'll probably want to use the
	33	C<Benchmark> or C<Devel::NYTProf> modules, or something similar, for this step,
	34	or perhaps the Unix system C<time> utility, whichever is appropriate. See the
	35	base of this document for a longer list of benchmarking and profiling modules,
	36	and recommended further reading.
	37
	38	=head2 ONE STEP FORWARD
	39
	40	Next, having examined the program for I<hot spots>, (places where the code
	41	seems to run slowly), change the code with the intention of making it run
	42	faster. Using version control software, like C<subversion>, will ensure no
	43	changes are irreversible. It's too easy to fiddle here and fiddle there -
	44	don't change too much at any one time or you might not discover which piece of
	45	code B<really> was the slow bit.
	46
	47	=head2 ANOTHER STEP SIDEWAYS
	48
	49	It's not enough to say: "that will make it run faster", you have to check it.
	50	Rerun the code under control of the benchmarking or profiling modules, from the
	51	first step above, and check that the new code executed the B<same task> in
	52	I<less time>. Save your work and repeat...
	53
	54	=head1 GENERAL GUIDELINES
	55
	56	The critical thing when considering performance is to remember there is no such
	57	thing as a C<Golden Bullet>, which is why there are no rules, only guidelines.
	58
	59	It is clear that inline code is going to be faster than subroutine or method
	60	calls, because there is less overhead, but this approach has the disadvantage
	61	of being less maintainable and comes at the cost of greater memory usage -
	62	there is no such thing as a free lunch. If you are searching for an element in
	63	a list, it can be more efficient to store the data in a hash structure, and
	64	then simply look to see whether the key is defined, rather than to loop through
	65	the entire array using grep() for instance. substr() may be (a lot) faster
	66	than grep() but not as flexible, so you have another trade-off to access. Your
	67	code may contain a line which takes 0.01 of a second to execute which if you
	68	call it 1,000 times, quite likely in a program parsing even medium sized files
	69	for instance, you already have a 10 second delay, in just one single code
	70	location, and if you call that line 100,000 times, your entire program will
	71	slow down to an unbearable crawl.
	72
	73	Using a subroutine as part of your sort is a powerful way to get exactly what
	74	you want, but will usually be slower than the built-in I<alphabetic> C<cmp> and
	75	I<numeric> C<E<lt>=E<gt>> sort operators. It is possible to make multiple
	76	passes over your data, building indices to make the upcoming sort more
	77	efficient, and to use what is known as the C<OM> (Orcish Maneuver) to cache the
	78	sort keys in advance. The cache lookup, while a good idea, can itself be a
	79	source of slowdown by enforcing a double pass over the data - once to setup the
	80	cache, and once to sort the data. Using C<pack()> to extract the required sort
	81	key into a consistent string can be an efficient way to build a single string
	82	to compare, instead of using multiple sort keys, which makes it possible to use
	83	the standard, written in C<c> and fast, perl C<sort()> function on the output,
	84	and is the basis of the C<GRT> (Guttman Rossler Transform). Some string
	85	combinations can slow the C<GRT> down, by just being too plain complex for its
	86	own good.
	87
	88	For applications using database backends, the standard C<DBIx> namespace has
	89	tries to help with keeping things nippy, not least because it tries to I<not>
	90	query the database until the latest possible moment, but always read the docs
	91	which come with your choice of libraries. Among the many issues facing
	92	developers dealing with databases should remain aware of is to always use
	93	C<SQL> placeholders and to consider pre-fetching data sets when this might
	94	prove advantageous. Splitting up a large file by assigning multiple processes
	95	to parsing a single file, using say C<POE>, C<threads> or C<fork> can also be a
	96	useful way of optimizing your usage of the available C<CPU> resources, though
	97	this technique is fraught with concurrency issues and demands high attention to
	98	detail.
	99
	100	Every case has a specific application and one or more exceptions, and there is
	101	no replacement for running a few tests and finding out which method works best
	102	for your particular environment, this is why writing optimal code is not an
	103	exact science, and why we love using Perl so much - TMTOWTDI.
	104
	105	=head1 BENCHMARKS
	106
	107	Here are a few examples to demonstrate usage of Perl's benchmarking tools.
	108
	109	=head2 Assigning and Dereferencing Variables.
	110
	111	I'm sure most of us have seen code which looks like, (or worse than), this:
	112
	113	if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
	114	...
	115
	116	This sort of code can be a real eyesore to read, as well as being very
	117	sensitive to typos, and it's much clearer to dereference the variable
	118	explicitly. We're side-stepping the issue of working with object-oriented
	119	programming techniques to encapsulate variable access via methods, only
	120	accessible through an object. Here we're just discussing the technical
	121	implementation of choice, and whether this has an effect on performance. We
	122	can see whether this dereferencing operation, has any overhead by putting
	123	comparative code in a file and running a C<Benchmark> test.
	124
	125	# dereference
	126
	127	#!/usr/bin/perl
	128
	129	use strict;
	130	use warnings;
	131
	132	use Benchmark;
	133
	134	my $ref = {
	135	'ref' => {
	136	_myscore => '100 + 1',
	137	_yourscore => '102 - 1',
	138	},
	139	};
	140
	141	timethese(1000000, {
	142	'direct' => sub {
	143	my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
	144	},
	145	'dereference' => sub {
	146	my $ref = $ref->{ref};
	147	my $myscore = $ref->{_myscore};
	148	my $yourscore = $ref->{_yourscore};
	149	my $x = $myscore . $yourscore;
	150	},
	151	});
	152
	153	It's essential to run any timing measurements a sufficient number of times so
	154	the numbers settle on a numerical average, otherwise each run will naturally
	155	fluctuate due to variations in the environment, to reduce the effect of
	156	contention for C<CPU> resources and network bandwidth for instance. Running
	157	the above code for one million iterations, we can take a look at the report
	158	output by the C<Benchmark> module, to see which approach is the most effective.
	159
	160	$> perl dereference
	161
	162	Benchmark: timing 1000000 iterations of dereference, direct...
	163	dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000)
	164	direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000)
	165
	166	The difference is clear to see and the dereferencing approach is slower. While
	167	it managed to execute an average of 628,930 times a second during our test, the
	168	direct approach managed to run an additional 204,403 times, unfortunately.
	169	Unfortunately, because there are many examples of code written using the
	170	multiple layer direct variable access, and it's usually horrible. It is,
	171	however, minusculy faster. The question remains whether the minute gain is
	172	actually worth the eyestrain, or the loss of maintainability.
	173
	174	=head2 Search and replace or tr
	175
	176	If we have a string which needs to be modified, while a regex will almost
	177	always be much more flexible, C<tr>, an oft underused tool, can still be a
	178	useful. One scenario might be replace all vowels with another character. The
	179	regex solution might look like this:
	180
	181	$str =~ s/[aeiou]/x/g
	182
	183	The C<tr> alternative might look like this:
	184
	185	$str =~ tr/aeiou/xxxxx/
	186
	187	We can put that into a test file which we can run to check which approach is
	188	the fastest, using a global C<$STR> variable to assign to the C<my $str>
	189	variable so as to avoid perl trying to optimize any of the work away by
	190	noticing it's assigned only the once.
	191
	192	# regex-transliterate
	193
	194	#!/usr/bin/perl
	195
	196	use strict;
	197	use warnings;
	198
	199	use Benchmark;
	200
	201	my $STR = "$$-this and that";
	202
	203	timethese( 1000000, {
	204	'sr' => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; },
	205	'tr' => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; },
	206	});
	207
	208	Running the code gives us our results:
	209
	210	$> perl regex-transliterate
	211
	212	Benchmark: timing 1000000 iterations of sr, tr...
	213	sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000)
	214	tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000)
	215
	216	The C<tr> version is a clear winner. One solution is flexible, the other is
	217	fast - and it's appropriately the programmer's choice which to use.
	218
	219	Check the C<Benchmark> docs for further useful techniques.
	220
	221	=head1 PROFILING TOOLS
	222
	223	A slightly larger piece of code will provide something on which a profiler can
	224	produce more extensive reporting statistics. This example uses the simplistic
	225	C<wordmatch> program which parses a given input file and spews out a short
	226	report on the contents.
	227
	228	# wordmatch
	229
	230	#!/usr/bin/perl
	231
	232	use strict;
	233	use warnings;
	234
	235	=head1 NAME
	236
	237	filewords - word analysis of input file
	238
	239	=head1 SYNOPSIS
	240
	241	filewords -f inputfilename [-d]
	242
	243	=head1 DESCRIPTION
	244
	245	This program parses the given filename, specified with C<-f>, and
	246	displays a simple analysis of the words found therein. Use the C<-d>
	247	switch to enable debugging messages.
	248
	249	=cut
	250
	251	use FileHandle;
	252	use Getopt::Long;
	253
	254	my $debug = 0;
	255	my $file = '';
	256
	257	my $result = GetOptions (
	258	'debug' => \$debug,
	259	'file=s' => \$file,
	260	);
	261	die("invalid args") unless $result;
	262
	263	unless ( -f $file ) {
	264	die("Usage: $0 -f filename [-d]");
	265	}
	266	my $FH = FileHandle->new("< $file")
	267	or die("unable to open file($file): $!");
	268
	269	my $i_LINES = 0;
	270	my $i_WORDS = 0;
	271	my %count = ();
	272
	273	my @lines = <$FH>;
	274	foreach my $line ( @lines ) {
	275	$i_LINES++;
	276	$line =~ s/\n//;
	277	my @words = split(/ +/, $line);
	278	my $i_words = scalar(@words);
	279	$i_WORDS = $i_WORDS + $i_words;
	280	debug("line: $i_LINES supplying $i_words words: @words");
	281	my $i_word = 0;
	282	foreach my $word ( @words ) {
	283	$i_word++;
	284	$count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]');
	285	$count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$');
	286	$count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
	287	$count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$');
	288	$count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$');
	289	}
	290	}
	291
	292	print report( %count );
	293
	294	sub matches {
	295	my $i_wd = shift;
	296	my $word = shift;
	297	my $regex = shift;
	298	my $has = 0;
	299
	300	if ( $word =~ /($regex)/ ) {
	301	$has++ if $1;
	302	}
	303
	304	debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
	305
	306	return $has;
	307	}
	308
	309	sub report {
	310	my %report = @_;
	311	my %rep;
	312
	313	foreach my $line ( keys %report ) {
	314	foreach my $key ( keys %{ $report{$line} } ) {
	315	$rep{$key} += $report{$line}{$key};
	316	}
	317	}
	318
	319	my $report = qq\|
	320	$0 report for $file:
	321	lines in file: $i_LINES
	322	words in file: $i_WORDS
	323	words with special (non-word) characters: $i_spec
	324	words with only special (non-word) characters: $i_only
	325	words with only consonants: $i_cons
	326	words with only capital letters: $i_caps
	327	words with only vowels: $i_vows
	328	\|;
	329
	330	return $report;
	331	}
	332
	333	sub debug {
	334	my $message = shift;
	335
	336	if ( $debug ) {
	337	print STDERR "DBG: $message\n";
	338	}
	339	}
	340
	341	exit 0;
	342
	343	=head2 Devel::DProf
	344
	345	This venerable module has been the de-facto standard for Perl code profiling
	346	for more than a decade, but has been replaced by a number of other modules
	347	which have brought us back to the 21st century. Although you're recommended to
	348	evaluate your tool from the several mentioned here and from the CPAN list at
	349	the base of this document, (and currently L<Devel::NYTProf> seems to be the
	350	weapon of choice - see below), we'll take a quick look at the output from
	351	L<Devel::DProf> first, to set a baseline for Perl profiling tools. Run the
	352	above program under the control of C<Devel::DProf> by using the C<-d> switch on
	353	the command-line.
	354
	355	$> perl -d:DProf wordmatch -f perl5db.pl
	356
	357	<...multiple lines snipped...>
	358
	359	wordmatch report for perl5db.pl:
	360	lines in file: 9428
	361	words in file: 50243
	362	words with special (non-word) characters: 20480
	363	words with only special (non-word) characters: 7790
	364	words with only consonants: 4801
	365	words with only capital letters: 1316
	366	words with only vowels: 1701
	367
	368	C<Devel::DProf> produces a special file, called F<tmon.out> by default, and
	369	this file is read by the C<dprofpp> program, which is already installed as part
	370	of the C<Devel::DProf> distribution. If you call C<dprofpp> with no options,
	371	it will read the F<tmon.out> file in the current directory and produce a human
	372	readable statistics report of the run of your program. Note that this may take
	373	a little time.
	374
	375	$> dprofpp
	376
	377	Total Elapsed Time = 2.951677 Seconds
	378	User+System Time = 2.871677 Seconds
	379	Exclusive Times
	380	%Time ExclSec CumulS #Calls sec/call Csec/c Name
	381	102. 2.945 3.003 251215 0.0000 0.0000 main::matches
	382	2.40 0.069 0.069 260643 0.0000 0.0000 main::debug
	383	1.74 0.050 0.050 1 0.0500 0.0500 main::report
	384	1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN
	385	0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy
	386	0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN
	387	0.00 - -0.000 1 - - Getopt::Long::FindOption
	388	0.00 - -0.000 1 - - Symbol::BEGIN
	389	0.00 - -0.000 1 - - Fcntl::BEGIN
	390	0.00 - -0.000 1 - - Fcntl::bootstrap
	391	0.00 - -0.000 1 - - warnings::BEGIN
	392	0.00 - -0.000 1 - - IO::bootstrap
	393	0.00 - -0.000 1 - - Getopt::Long::ConfigDefaults
	394	0.00 - -0.000 1 - - Getopt::Long::Configure
	395	0.00 - -0.000 1 - - Symbol::gensym
	396
	397	C<dprofpp> will produce some quite detailed reporting on the activity of the
	398	C<wordmatch> program. The wallclock, user and system, times are at the top of
	399	the analysis, and after this are the main columns defining which define the
	400	report. Check the C<dprofpp> docs for details of the many options it supports.
	401
	402	See also C<Apache::DProf> which hooks C<Devel::DProf> into C<mod_perl>.
	403
	404	=head2 Devel::Profiler
	405
	406	Let's take a look at the same program using a different profiler:
	407	C<Devel::Profiler>, a drop-in Perl-only replacement for C<Devel::DProf>. The
	408	usage is very slightly different in that instead of using the special C<-d:>
	409	flag, you pull C<Devel::Profiler> in directly as a module using C<-M>.
	410
	411	$> perl -MDevel::Profiler wordmatch -f perl5db.pl
	412
	413	<...multiple lines snipped...>
	414
	415	wordmatch report for perl5db.pl:
	416	lines in file: 9428
	417	words in file: 50243
	418	words with special (non-word) characters: 20480
	419	words with only special (non-word) characters: 7790
	420	words with only consonants: 4801
	421	words with only capital letters: 1316
	422	words with only vowels: 1701
	423
	424
	425	C<Devel::Profiler> generates a tmon.out file which is compatible with the
	426	C<dprofpp> program, thus saving the construction of a dedicated statistics
	427	reader program. C<dprofpp> usage is therefore identical to the above example.
	428
	429	$> dprofpp
	430
	431	Total Elapsed Time = 20.984 Seconds
	432	User+System Time = 19.981 Seconds
	433	Exclusive Times
	434	%Time ExclSec CumulS #Calls sec/call Csec/c Name
	435	49.0 9.792 14.509 251215 0.0000 0.0001 main::matches
	436	24.4 4.887 4.887 260643 0.0000 0.0000 main::debug
	437	0.25 0.049 0.049 1 0.0490 0.0490 main::report
	438	0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions
	439	0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec
	440	0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption
	441	0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new
	442	0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new
	443	0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym
	444	0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open
	445
	446	Interestingly we get slightly different results, which is mostly because the
	447	algorithm which generates the report is different, even though the output file
	448	format was allegedly identical. The elapsed, user and system times are clearly
	449	showing the time it took for C<Devel::Profiler> to execute its own run, but
	450	the column listings feel more accurate somehow than the ones we had earlier
	451	from C<Devel::DProf>. The 102% figure has disappeared, for example. This is
	452	where we have to use the tools at our disposal, and recognise their pros and
	453	cons, before using them. Interestingly, the numbers of calls for each
	454	subroutine are identical in the two reports, it's the percentages which differ.
	455	As the author of C<Devel::Proviler> writes:
	456
	457	...running HTML::Template's test suite under Devel::DProf shows
	458	output() taking NO time but Devel::Profiler shows around 10% of the
	459	time is in output(). I don't know which to trust but my gut tells me
	460	something is wrong with Devel::DProf. HTML::Template::output() is a
	461	big routine that's called for every test. Either way, something needs
	462	fixing.
	463
	464	YMMV.
	465
	466	See also C<Devel::Apache::Profiler> which hooks C<Devel::Profiler> into C<mod_perl>.
	467
	468	=head2 Devel::SmallProf
	469
	470	The C<Devel::SmallProf> profiler examines the runtime of your Perl program and
	471	produces a line-by-line listing to show how many times each line was called,
	472	and how long each line took to execute. It is called by supplying the familiar
	473	C<-d> flag to Perl at runtime.
	474
	475	$> perl -d:SmallProf wordmatch -f perl5db.pl
	476
	477	<...multiple lines snipped...>
	478
	479	wordmatch report for perl5db.pl:
	480	lines in file: 9428
	481	words in file: 50243
	482	words with special (non-word) characters: 20480
	483	words with only special (non-word) characters: 7790
	484	words with only consonants: 4801
	485	words with only capital letters: 1316
	486	words with only vowels: 1701
	487
	488	C<Devel::SmallProf> writes it's output into a file called F<smallprof.out>, by
	489	default. The format of the file looks like this:
	490
	491	<num> <time> <ctime> <line>:<text>
	492
	493	When the program has terminated, the output may be examined and sorted using
	494	any standard text filtering utilities. Something like the following may be
	495	sufficient:
	496
	497	$> cat smallprof.out \| grep \d*: \| sort -k3 \| tac \| head -n20
	498
	499	251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) {
	500	251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? 'matches' :
	501	251215 0.02693 4.10000 81: return $has;
	502	260643 0.02841 4.07000 128: if ( $debug ) {
	503	260643 0.02601 4.04000 126: my $message = shift;
	504	251215 0.02641 3.91000 73: my $has = 0;
	505	251215 0.03311 3.71000 70: my $i_wd = shift;
	506	251215 0.02699 3.69000 72: my $regex = shift;
	507	251215 0.02766 3.68000 71: my $word = shift;
	508	50243 0.59726 1.00000 59: $count{$i_LINES}{cons} =
	509	50243 0.48175 0.92000 61: $count{$i_LINES}{spec} =
	510	50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word,
	511	50243 0.48837 0.88000 63: $count{$i_LINES}{caps} =
	512	50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, '^[(A-
	513	50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, '[^a-
	514	50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word,
	515	50243 0.00688 0.80000 53: $i_word++;
	516	50243 0.48469 0.79000 62: $count{$i_LINES}{only} =
	517	50243 0.48928 0.77000 60: $count{$i_LINES}{vows} =
	518	50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, '^[^a-
	519
	520	You can immediately see a slightly different focus to the subroutine profiling
	521	modules, and we start to see exactly which line of code is taking the most
	522	time. That regex line is looking a bit suspicious, for example. Remember that
	523	these tools are supposed to be used together, there is no single best way to
	524	profile your code, you need to use the best tools for the job.
	525
	526	See also C<Apache::SmallProf> which hooks C<Devel::SmallProf> into C<mod_perl>.
	527
	528	=head2 Devel::FastProf
	529
	530	C<Devel::FastProf> is another Perl line profiler. This was written with a view
	531	to getting a faster line profiler, than is possible with for example
	532	C<Devel::SmallProf>, because it's written in C<C>. To use C<Devel::FastProf>,
	533	supply the C<-d> argument to Perl:
	534
	535	$> perl -d:FastProf wordmatch -f perl5db.pl
	536
	537	<...multiple lines snipped...>
	538
	539	wordmatch report for perl5db.pl:
	540	lines in file: 9428
	541	words in file: 50243
	542	words with special (non-word) characters: 20480
	543	words with only special (non-word) characters: 7790
	544	words with only consonants: 4801
	545	words with only capital letters: 1316
	546	words with only vowels: 1701
	547
	548	C<Devel::FastProf> writes statistics to the file F<fastprof.out> in the current
	549	directory. The output file, which can be specified, can be interpreted by using
	550	the C<fprofpp> command-line program.
	551
	552	$> fprofpp \| head -n20
	553
	554	# fprofpp output format is:
	555	# filename:line time count: source
	556	wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
	557	wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
	558	wordmatch:81 1.47604 251215: return $has;
	559	wordmatch:126 1.43441 260643: my $message = shift;
	560	wordmatch:128 1.42156 260643: if ( $debug ) {
	561	wordmatch:70 1.36824 251215: my $i_wd = shift;
	562	wordmatch:71 1.36739 251215: my $word = shift;
	563	wordmatch:72 1.35939 251215: my $regex = shift;
	564
	565	Straightaway we can see that the number of times each line has been called is
	566	identical to the C<Devel::SmallProf> output, and the sequence is only very
	567	slightly different based on the ordering of the amount of time each line took
	568	to execute, C<if ( $debug ) { > and C<my $message = shift;>, for example. The
	569	differences in the actual times recorded might be in the algorithm used
	570	internally, or it could be due to system resource limitations or contention.
	571
	572	See also the L<DBIx::Profile> which will profile database queries running
	573	under the C<DBIx::*> namespace.
	574
	575	=head2 Devel::NYTProf
	576
	577	C<Devel::NYTProf> is the B<next generation> of Perl code profiler, fixing many
	578	shortcomings in other tools and implementing many cool features. First of all it
	579	can be used as either a I<line> profiler, a I<block> or a I<subroutine>
	580	profiler, all at once. It can also use sub-microsecond (100ns) resolution on
	581	systems which provide C<clock_gettime()>. It can be started and stopped even
	582	by the program being profiled. It's a one-line entry to profile C<mod_perl>
	583	applications. It's written in C<c> and is probably the fastest profiler
	584	available for Perl. The list of coolness just goes on. Enough of that, let's
	585	see how to it works - just use the familiar C<-d> switch to plug it in and run
	586	the code.
	587
	588	$> perl -d:NYTProf wordmatch -f perl5db.pl
	589
	590	wordmatch report for perl5db.pl:
	591	lines in file: 9427
	592	words in file: 50243
	593	words with special (non-word) characters: 20480
	594	words with only special (non-word) characters: 7790
	595	words with only consonants: 4801
	596	words with only capital letters: 1316
	597	words with only vowels: 1701
	598
	599	C<NYTProf> will generate a report database into the file F<nytprof.out> by
	600	default. Human readable reports can be generated from here by using the
	601	supplied C<nytprofhtml> (HTML output) and C<nytprofcsv> (CSV output) programs.
	602	We've used the Unix system C<html2text> utility to convert the
	603	F<nytprof/index.html> file for convenience here.
	604
	605	$> html2text nytprof/index.html
	606
	607	Performance Profile Index
	608	For wordmatch
	609	Run on Fri Sep 26 13:46:39 2008
	610	Reported on Fri Sep 26 13:47:23 2008
	611
	612	Top 15 Subroutines -- ordered by exclusive time
	613	\|Calls \|P \|F \|Inclusive\|Exclusive\|Subroutine \|
	614	\| \| \| \|Time \|Time \| \|
	615	\|251215\|5 \|1 \|13.09263 \|10.47692 \|main:: \|matches \|
	616	\|260642\|2 \|1 \|2.71199 \|2.71199 \|main:: \|debug \|
	617	\|1 \|1 \|1 \|0.21404 \|0.21404 \|main:: \|report \|
	618	\|2 \|2 \|2 \|0.00511 \|0.00511 \|XSLoader:: \|load (xsub) \|
	619	\|14 \|14\|7 \|0.00304 \|0.00298 \|Exporter:: \|import \|
	620	\|3 \|1 \|1 \|0.00265 \|0.00254 \|Exporter:: \|as_heavy \|
	621	\|10 \|10\|4 \|0.00140 \|0.00140 \|vars:: \|import \|
	622	\|13 \|13\|1 \|0.00129 \|0.00109 \|constant:: \|import \|
	623	\|1 \|1 \|1 \|0.00360 \|0.00096 \|FileHandle:: \|import \|
	624	\|3 \|3 \|3 \|0.00086 \|0.00074 \|warnings::register::\|import \|
	625	\|9 \|3 \|1 \|0.00036 \|0.00036 \|strict:: \|bits \|
	626	\|13 \|13\|13\|0.00032 \|0.00029 \|strict:: \|import \|
	627	\|2 \|2 \|2 \|0.00020 \|0.00020 \|warnings:: \|import \|
	628	\|2 \|1 \|1 \|0.00020 \|0.00020 \|Getopt::Long:: \|ParseOptionSpec\|
	629	\|7 \|7 \|6 \|0.00043 \|0.00020 \|strict:: \|unimport \|
	630
	631	For more information see the full list of 189 subroutines.
	632
	633	The first part of the report already shows the critical information regarding
	634	which subroutines are using the most time. The next gives some statistics
	635	about the source files profiled.
	636
	637	Source Code Files -- ordered by exclusive time then name
	638	\|Stmts \|Exclusive\|Avg. \|Reports \|Source File \|
	639	\| \|Time \| \| \| \|
	640	\|2699761\|15.66654 \|6e-06 \|line . block . sub\|wordmatch \|
	641	\|35 \|0.02187 \|0.00062\|line . block . sub\|IO/Handle.pm \|
	642	\|274 \|0.01525 \|0.00006\|line . block . sub\|Getopt/Long.pm \|
	643	\|20 \|0.00585 \|0.00029\|line . block . sub\|Fcntl.pm \|
	644	\|128 \|0.00340 \|0.00003\|line . block . sub\|Exporter/Heavy.pm \|
	645	\|42 \|0.00332 \|0.00008\|line . block . sub\|IO/File.pm \|
	646	\|261 \|0.00308 \|0.00001\|line . block . sub\|Exporter.pm \|
	647	\|323 \|0.00248 \|8e-06 \|line . block . sub\|constant.pm \|
	648	\|12 \|0.00246 \|0.00021\|line . block . sub\|File/Spec/Unix.pm \|
	649	\|191 \|0.00240 \|0.00001\|line . block . sub\|vars.pm \|
	650	\|77 \|0.00201 \|0.00003\|line . block . sub\|FileHandle.pm \|
	651	\|12 \|0.00198 \|0.00016\|line . block . sub\|Carp.pm \|
	652	\|14 \|0.00175 \|0.00013\|line . block . sub\|Symbol.pm \|
	653	\|15 \|0.00130 \|0.00009\|line . block . sub\|IO.pm \|
	654	\|22 \|0.00120 \|0.00005\|line . block . sub\|IO/Seekable.pm \|
	655	\|198 \|0.00085 \|4e-06 \|line . block . sub\|warnings/register.pm\|
	656	\|114 \|0.00080 \|7e-06 \|line . block . sub\|strict.pm \|
	657	\|47 \|0.00068 \|0.00001\|line . block . sub\|warnings.pm \|
	658	\|27 \|0.00054 \|0.00002\|line . block . sub\|overload.pm \|
	659	\|9 \|0.00047 \|0.00005\|line . block . sub\|SelectSaver.pm \|
	660	\|13 \|0.00045 \|0.00003\|line . block . sub\|File/Spec.pm \|
	661	\|2701595\|15.73869 \| \|Total \|
	662	\|128647 \|0.74946 \| \|Average \|
	663	\| \|0.00201 \|0.00003\|Median \|
	664	\| \|0.00121 \|0.00003\|Deviation \|
	665
	666	Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
	667	Adam Kaplan.
	668
	669	At this point, if you're using the I<html> report, you can click through the
	670	various links to bore down into each subroutine and each line of code. Because
	671	we're using the text reporting here, and there's a whole directory full of
	672	reports built for each source file, we'll just display a part of the
	673	corresponding F<wordmatch-line.html> file, sufficient to give an idea of the
	674	sort of output you can expect from this cool tool.
	675
	676	$> html2text nytprof/wordmatch-line.html
	677
	678	Performance Profile -- -block view-.-line view-.-sub view-
	679	For wordmatch
	680	Run on Fri Sep 26 13:46:39 2008
	681	Reported on Fri Sep 26 13:47:22 2008
	682
	683	File wordmatch
	684
	685	Subroutines -- ordered by exclusive time
	686	\|Calls \|P\|F\|Inclusive\|Exclusive\|Subroutine \|
	687	\| \| \| \|Time \|Time \| \|
	688	\|251215\|5\|1\|13.09263 \|10.47692 \|main::\|matches\|
	689	\|260642\|2\|1\|2.71199 \|2.71199 \|main::\|debug \|
	690	\|1 \|1\|1\|0.21404 \|0.21404 \|main::\|report \|
	691	\|0 \|0\|0\|0 \|0 \|main::\|BEGIN \|
	692
	693
	694	\|Line\|Stmts.\|Exclusive\|Avg. \|Code \|
	695	\| \| \|Time \| \| \|
	696	\|1 \| \| \| \|#!/usr/bin/perl \|
	697	\|2 \| \| \| \| \|
	698	\| \| \| \| \|use strict; \|
	699	\|3 \|3 \|0.00086 \|0.00029\|# spent 0.00003s making 1 calls to strict:: \|
	700	\| \| \| \| \|import \|
	701	\| \| \| \| \|use warnings; \|
	702	\|4 \|3 \|0.01563 \|0.00521\|# spent 0.00012s making 1 calls to warnings:: \|
	703	\| \| \| \| \|import \|
	704	\|5 \| \| \| \| \|
	705	\|6 \| \| \| \|=head1 NAME \|
	706	\|7 \| \| \| \| \|
	707	\|8 \| \| \| \|filewords - word analysis of input file \|
	708	<...snip...>
	709	\|62 \|1 \|0.00445 \|0.00445\|print report( %count ); \|
	710	\| \| \| \| \|# spent 0.21404s making 1 calls to main::report\|
	711	\|63 \| \| \| \| \|
	712	\| \| \| \| \|# spent 23.56955s (10.47692+2.61571) within \|
	713	\| \| \| \| \|main::matches which was called 251215 times, \|
	714	\| \| \| \| \|avg 0.00005s/call: # 50243 times \|
	715	\| \| \| \| \|(2.12134+0.51939s) at line 57 of wordmatch, avg\|
	716	\| \| \| \| \|0.00005s/call # 50243 times (2.17735+0.54550s) \|
	717	\|64 \| \| \| \|at line 56 of wordmatch, avg 0.00005s/call # \|
	718	\| \| \| \| \|50243 times (2.10992+0.51797s) at line 58 of \|
	719	\| \| \| \| \|wordmatch, avg 0.00005s/call # 50243 times \|
	720	\| \| \| \| \|(2.12696+0.51598s) at line 55 of wordmatch, avg\|
	721	\| \| \| \| \|0.00005s/call # 50243 times (1.94134+0.51687s) \|
	722	\| \| \| \| \|at line 54 of wordmatch, avg 0.00005s/call \|
	723	\| \| \| \| \|sub matches { \|
	724	<...snip...>
	725	\|102 \| \| \| \| \|
	726	\| \| \| \| \|# spent 2.71199s within main::debug which was \|
	727	\| \| \| \| \|called 260642 times, avg 0.00001s/call: # \|
	728	\| \| \| \| \|251215 times (2.61571+0s) by main::matches at \|
	729	\|103 \| \| \| \|line 74 of wordmatch, avg 0.00001s/call # 9427 \|
	730	\| \| \| \| \|times (0.09628+0s) at line 50 of wordmatch, avg\|
	731	\| \| \| \| \|0.00001s/call \|
	732	\| \| \| \| \|sub debug { \|
	733	\|104 \|260642\|0.58496 \|2e-06 \|my $message = shift; \|
	734	\|105 \| \| \| \| \|
	735	\|106 \|260642\|1.09917 \|4e-06 \|if ( $debug ) { \|
	736	\|107 \| \| \| \|print STDERR "DBG: $message\n"; \|
	737	\|108 \| \| \| \|} \|
	738	\|109 \| \| \| \|} \|
	739	\|110 \| \| \| \| \|
	740	\|111 \|1 \|0.01501 \|0.01501\|exit 0; \|
	741	\|112 \| \| \| \| \|
	742
	743	Oodles of very useful information in there - this seems to be the way forward.
	744
	745	See also C<Devel::NYTProf::Apache> which hooks C<Devel::NYTProf> into C<mod_perl>.
	746
	747	=head1 SORTING
	748
	749	Perl modules are not the only tools a performance analyst has at their
	750	disposal, system tools like C<time> should not be overlooked as the next
	751	example shows, where we take a quick look at sorting. Many books, theses and
	752	articles, have been written about efficient sorting algorithms, and this is not
	753	the place to repeat such work, there's several good sorting modules which
	754	deserve taking a look at too: C<Sort::Maker>, C<Sort::Key> spring to mind.
	755	However, it's still possible to make some observations on certain Perl specific
	756	interpretations on issues relating to sorting data sets and give an example or
	757	two with regard to how sorting large data volumes can effect performance.
	758	Firstly, an often overlooked point when sorting large amounts of data, one can
	759	attempt to reduce the data set to be dealt with and in many cases C<grep()> can
	760	be quite useful as a simple filter:
	761
	762	@data = sort grep { /$filter/ } @incoming
	763
	764	A command such as this can vastly reduce the volume of material to actually
	765	sort through in the first place, and should not be too lightly disregarded
	766	purely on the basis of its simplicity. The C<KISS> principle is too often
	767	overlooked - the next example uses the simple system C<time> utility to
	768	demonstrate. Let's take a look at an actual example of sorting the contents of
	769	a large file, an apache logfile would do. This one has over a quarter of a
	770	million lines, is 50M in size, and a snippet of it looks like this:
	771
	772	# logfile
	773
	774	188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	775	188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	776	151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
	777	151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
	778	151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
	779	217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	780	217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	781	debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
	782	debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
	783	debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
	784	195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
	785	195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
	786	195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
	787	crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
	788	crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
	789	fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	790	fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
	791	80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
	792	80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
	793	pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
	794	livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
	795	livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
	796	dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
	797	196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"
	798
	799	The specific task here is to sort the 286,525 lines of this file by Response
	800	Code, Query, Browser, Referring Url, and lastly Date. One solution might be to
	801	use the following code, which iterates over the files given on the
	802	command-line.
	803
	804	# sort-apache-log
	805
	806	#!/usr/bin/perl -n
	807
	808	use strict;
	809	use warnings;
	810
	811	my @data;
	812
	813	LINE:
	814	while ( <> ) {
	815	my $line = $_;
	816	if (
	817	$line =~ m/^(
	818	([\w\.\-]+) # client
	819	\s-\s-\s*\[
	820	([^]]+) # date
	821	\]\s"\w+\s
	822	(\S+) # query
	823	[^"]+"\s*
	824	(\d+) # status
	825	\s+\S+\s+"[^"]*"\s+"
	826	([^"]*) # browser
	827	"
	828	.*
	829	)$/x
	830	) {
	831	my @chunks = split(/ +/, $line);
	832	my $ip = $1;
	833	my $date = $2;
	834	my $query = $3;
	835	my $status = $4;
	836	my $browser = $5;
	837
	838	push(@data, [$ip, $date, $query, $status, $browser, $line]);
	839	}
	840	}
	841
	842	my @sorted = sort {
	843	$a->[3] cmp $b->[3]
	844	\|\|
	845	$a->[2] cmp $b->[2]
	846	\|\|
	847	$a->[0] cmp $b->[0]
	848	\|\|
	849	$a->[1] cmp $b->[1]
	850	\|\|
	851	$a->[4] cmp $b->[4]
	852	} @data;
	853
	854	foreach my $data ( @sorted ) {
	855	print $data->[5];
	856	}
	857
	858	exit 0;
	859
	860	When running this program, redirect C<STDOUT> so it is possible to check the
	861	output is correct from following test runs and use the system C<time> utility
	862	to check the overall runtime.
	863
	864	$> time ./sort-apache-log logfile > out-sort
	865
	866	real 0m17.371s
	867	user 0m15.757s
	868	sys 0m0.592s
	869
	870	The program took just over 17 wallclock seconds to run. Note the different
	871	values C<time> outputs, it's important to always use the same one, and to not
	872	confuse what each one means.
	873
	874	=over 4
	875
	876	=item Elapsed Real Time
	877
	878	The overall, or wallclock, time between when C<time> was called, and when it
	879	terminates. The elapsed time includes both user and system times, and time
	880	spent waiting for other users and processes on the system. Inevitably, this is
	881	the most approximate of the measurements given.
	882
	883	=item User CPU Time
	884
	885	The user time is the amount of time the entire process spent on behalf of the
	886	user on this system executing this program.
	887
	888	=item System CPU Time
	889
	890	The system time is the amount of time the kernel itself spent executing
	891	routines, or system calls, on behalf of this process user.
	892
	893	=back
	894
	895	Running this same process as a C<Schwarzian Transform> it is possible to
	896	eliminate the input and output arrays for storing all the data, and work on the
	897	input directly as it arrives too. Otherwise, the code looks fairly similar:
	898
	899	# sort-apache-log-schwarzian
	900
	901	#!/usr/bin/perl -n
	902
	903	use strict;
	904	use warnings;
	905
	906	print
	907
	908	map $_->[0] =>
	909
	910	sort {
	911	$a->[4] cmp $b->[4]
	912	\|\|
	913	$a->[3] cmp $b->[3]
	914	\|\|
	915	$a->[1] cmp $b->[1]
	916	\|\|
	917	$a->[2] cmp $b->[2]
	918	\|\|
	919	$a->[5] cmp $b->[5]
	920	}
	921	map [ $_, m/^(
	922	([\w\.\-]+) # client
	923	\s-\s-\s*\[
	924	([^]]+) # date
	925	\]\s"\w+\s
	926	(\S+) # query
	927	[^"]+"\s*
	928	(\d+) # status
	929	\s+\S+\s+"[^"]*"\s+"
	930	([^"]*) # browser
	931	"
	932	.*
	933	)$/xo ]
	934
	935	=> <>;
	936
	937	exit 0;
	938
	939	Run the new code against the same logfile, as above, to check the new time.
	940
	941	$> time ./sort-apache-log-schwarzian logfile > out-schwarz
	942
	943	real 0m9.664s
	944	user 0m8.873s
	945	sys 0m0.704s
	946
	947	The time has been cut in half, which is a respectable speed improvement by any
	948	standard. Naturally, it is important to check the output is consistent with
	949	the first program run, this is where the Unix system C<cksum> utility comes in.
	950
	951	$> cksum out-sort out-schwarz
	952	3044173777 52029194 out-sort
	953	3044173777 52029194 out-schwarz
	954
	955	BTW. Beware too of pressure from managers who see you speed a program up by 50%
	956	of the runtime once, only to get a request one month later to do the same again
	957	(true story) - you'll just have to point out you're only human, even if you are a
	958	Perl programmer, and you'll see what you can do...
	959
	960	=head1 LOGGING
	961
	962	An essential part of any good development process is appropriate error handling
	963	with appropriately informative messages, however there exists a school of
	964	thought which suggests that log files should be I<chatty>, as if the chain of
	965	unbroken output somehow ensures the survival of the program. If speed is in
	966	any way an issue, this approach is wrong.
	967
	968	A common sight is code which looks something like this:
	969
	970	logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) )
	971
	972	The problem is that this code will always be parsed and executed, even when the
	973	debug level set in the logging configuration file is zero. Once the debug()
	974	subroutine has been entered, and the internal C<$debug> variable confirmed to
	975	be zero, for example, the message which has been sent in will be discarded and
	976	the program will continue. In the example given though, the C<\%INC> hash will
	977	already have been dumped, and the message string constructed, all of which work
	978	could be bypassed by a debug variable at the statement level, like this:
	979
	980	logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG;
	981
	982	This effect can be demonstrated by setting up a test script with both forms,
	983	including a C<debug()> subroutine to emulate typical C<logger()> functionality.
	984
	985	# ifdebug
	986
	987	#!/usr/bin/perl
	988
	989	use strict;
	990	use warnings;
	991
	992	use Benchmark;
	993	use Data::Dumper;
	994	my $DEBUG = 0;
	995
	996	sub debug {
	997	my $msg = shift;
	998
	999	if ( $DEBUG ) {
	1000	print "DEBUG: $msg\n";
	1001	}
	1002	};
	1003
	1004	timethese(100000, {
	1005	'debug' => sub {
	1006	debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
	1007	},
	1008	'ifdebug' => sub {
	1009	debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
	1010	},
	1011	});
	1012
	1013	Let's see what C<Benchmark> makes of this:
	1014
	1015	$> perl ifdebug
	1016	Benchmark: timing 100000 iterations of constant, sub...
	1017	ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000)
	1018	(warning: too few iterations for a reliable count)
	1019	debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)
	1020
	1021	In the one case the code, which does exactly the same thing as far as
	1022	outputting any debugging information is concerned, in other words nothing,
	1023	takes 14 seconds, and in the other case the code takes one hundredth of a
	1024	second. Looks fairly definitive. Use a C<$DEBUG> variable BEFORE you call the
	1025	subroutine, rather than relying on the smart functionality inside it.
	1026
	1027	=head2 Logging if DEBUG (constant)
	1028
	1029	It's possible to take the previous idea a little further, by using a compile
	1030	time C<DEBUG> constant.
	1031
	1032	# ifdebug-constant
	1033
	1034	#!/usr/bin/perl
	1035
	1036	use strict;
	1037	use warnings;
	1038
	1039	use Benchmark;
	1040	use Data::Dumper;
	1041	use constant
	1042	DEBUG => 0
	1043	;
	1044
	1045	sub debug {
	1046	if ( DEBUG ) {
	1047	my $msg = shift;
	1048	print "DEBUG: $msg\n";
	1049	}
	1050	};
	1051
	1052	timethese(100000, {
	1053	'debug' => sub {
	1054	debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
	1055	},
	1056	'constant' => sub {
	1057	debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
	1058	},
	1059	});
	1060
	1061	Running this program produces the following output:
	1062
	1063	$> perl ifdebug-constant
	1064	Benchmark: timing 100000 iterations of constant, sub...
	1065	constant: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
	1066	(warning: too few iterations for a reliable count)
	1067	sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)
	1068
	1069	The C<DEBUG> constant wipes the floor with even the C<$debug> variable,
	1070	clocking in at minus zero seconds, and generates a "warning: too few iterations
	1071	for a reliable count" message into the bargain. To see what is really going
	1072	on, and why we had too few iterations when we thought we asked for 100000, we
	1073	can use the very useful C<B::Deparse> to inspect the new code:
	1074
	1075	$> perl -MO=Deparse ifdebug-constant
	1076
	1077	use Benchmark;
	1078	use Data::Dumper;
	1079	use constant ('DEBUG', 0);
	1080	sub debug {
	1081	use warnings;
	1082	use strict 'refs';
	1083	0;
	1084	}
	1085	use warnings;
	1086	use strict 'refs';
	1087	timethese(100000, {'sub', sub {
	1088	debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
	1089	}
	1090	, 'constant', sub {
	1091	0;
	1092	}
	1093	});
	1094	ifdebug-constant syntax OK
	1095
	1096	The output shows the constant() subroutine we're testing being replaced with
	1097	the value of the C<DEBUG> constant: zero. The line to be tested has been
	1098	completely optimized away, and you can't get much more efficient than that.
	1099
	1100	=head1 POSTSCRIPT
	1101
	1102	This document has provided several way to go about identifying hot-spots, and
	1103	checking whether any modifications have improved the runtime of the code.
	1104
	1105	As a final thought, remember that it's not (at the time of writing) possible to
	1106	produce a useful program which will run in zero or negative time and this basic
	1107	principle can be written as: I<useful programs are slow> by their very
	1108	definition. It is of course possible to write a nearly instantaneous program,
	1109	but it's not going to do very much, here's a very efficient one:
	1110
	1111	$> perl -e 0
	1112
	1113	Optimizing that any further is a job for C<p5p>.
	1114
	1115	=head1 SEE ALSO
	1116
	1117	Further reading can be found using the modules and links below.
	1118
	1119	=head2 PERLDOCS
	1120
	1121	For example: C<perldoc -f sort>.
	1122
	1123	L<perlfaq4>.
	1124
	1125	L<perlfork>, L<perlfunc>, L<perlretut>, L<perlthrtut>.
	1126
	1127	L<threads>.
	1128
	1129	=head2 MAN PAGES
	1130
	1131	C<time>.
	1132
	1133	=head2 MODULES
	1134
	1135	It's not possible to individually showcase all the performance related code for
	1136	Perl here, naturally, but here's a short list of modules from the CPAN which
	1137	deserve further attention.
	1138
	1139	Apache::DProf
	1140	Apache::SmallProf
	1141	Benchmark
	1142	DBIx::Profile
	1143	Devel::AutoProfiler
	1144	Devel::DProf
	1145	Devel::DProfLB
	1146	Devel::FastProf
	1147	Devel::GraphVizProf
	1148	Devel::NYTProf
	1149	Devel::NYTProf::Apache
	1150	Devel::Profiler
	1151	Devel::Profile
	1152	Devel::Profit
	1153	Devel::SmallProf
	1154	Devel::WxProf
	1155	POE::Devel::Profiler
	1156	Sort::Key
	1157	Sort::Maker
	1158
	1159	=head2 URLS
	1160
	1161	Very useful online reference material:
	1162
	1163	http://www.ccl4.org/~nick/P/Fast_Enough/
	1164
	1165	http://www-128.ibm.com/developerworks/library/l-optperl.html
	1166
	1167	http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html
	1168
	1169	http://en.wikipedia.org/wiki/Performance_analysis
	1170
	1171	http://apache.perl.org/docs/1.0/guide/performance.html
	1172
	1173	http://perlgolf.sourceforge.net/
	1174
	1175	http://www.sysarch.com/Perl/sort_paper.html
	1176
	1177	=head1 AUTHOR
	1178
	1179	Richard Foley <richard.foley@rfi.net> Copyright (c) 2008
	1180
	1181	=cut