[ Team LiB ] |
9.5 Code Profiling TechniquesThe profiling process helps you to determine which subroutines (or just snippets of code) take longest to execute and which subroutines are called most often. You will probably want to optimize these. When do you need to profile your code? You do that when you suspect that some part of your code is called very often and maybe there is a need to optimize it, which could significantly improve the overall performance. 9.5.1 Profiling with Devel::DProfDevel::DProf collects information on the execution time of a Perl script and of the subroutines in that script. Let's take for example the diagnostics pragma and measure the impact of its usage on the program compilation and execution speed. This pragma extends the terse diagnostics normally emitted by both the Perl compiler and the Perl interpreter, augmenting them with the more verbose and endearing descriptions found in the perldiag manpage. We have claimed that this pragma should not be used on a production server. We are going to use Devel::DProf to explain our claim. We will run a benchmark, once with diagnostics enabled and once disabled, on a subroutine called test_code( ). The code inside the subroutine does either a lexical or a numeric comparison of two strings. It assigns one string to another if the condition tests true, but the condition is always false. To demonstrate the diagnostics pragma overhead, the comparison operator that we use in Example 9-7 is intentionally wrong. It should be a string comparison (eq), and we use a numeric one (= =). Example 9-7. bench_diagnostics.pluse Benchmark; use diagnostics; use strict; my $count = 50000; disable diagnostics; my $t1 = timeit($count,\&test_code); enable diagnostics; my $t2 = timeit($count,\&test_code); print "Off: ",timestr($t1),"\n"; print "On : ",timestr($t2),"\n"; sub test_code { my ($a, $b) = qw(foo bar); my $c; if ($a = = $b) { $c = $a; } } For only a few lines of code we get: Off: 1 wallclock secs ( 0.81 usr + 0.00 sys = 0.81 CPU) On : 13 wallclock secs (12.54 usr + 0.01 sys = 12.55 CPU) With diagnostics enabled, the subroutine test_code( ) is 16 times slower (12.55/0.81: remember that we're talking in CPU time, not wallclock seconds) than with diagnostics disabled! Now let's fix the comparison the way it should be, by replacing = = with eq, so we get: my ($a, $b) = qw(foo bar); my $c; if ($a eq $b) { $c = $a; } and run the same benchmark again: Off: 1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU) On : 1 wallclock secs ( 0.56 usr + 0.00 sys = 0.56 CPU) Now there is no overhead at all. The diagnostics pragma slows things down only when warnings are generated. After we have verified that using the diagnostics pragma might add a big overhead to execution runtime, let's use code profiling to understand why this happens. We use Devel::DProf to profile the code shown in Example 9-8. Example 9-8. diagnostics.pluse diagnostics; test_code( ); sub test_code { my($a, $b) = qw(foo bar); my $c; if ($a = = $b) { $c = $a; } } Run it with the profiler enabled, and then create the profiling statistics with the help of dprofpp: panic% perl -d:DProf diagnostics.pl panic% dprofpp Total Elapsed Time = 0.342236 Seconds User+System Time = 0.335420 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 92.1 0.309 0.358 1 0.3089 0.3578 main::BEGIN 14.9 0.050 0.039 3161 0.0000 0.0000 diagnostics::unescape 2.98 0.010 0.010 2 0.0050 0.0050 diagnostics::BEGIN 0.00 0.000 -0.000 2 0.0000 - Exporter::import 0.00 0.000 -0.000 2 0.0000 - Exporter::export 0.00 0.000 -0.000 1 0.0000 - Config::BEGIN 0.00 0.000 -0.000 1 0.0000 - Config::TIEHASH 0.00 0.000 -0.000 2 0.0000 - Config::FETCH 0.00 0.000 -0.000 1 0.0000 - diagnostics::import 0.00 0.000 -0.000 1 0.0000 - main::test_code 0.00 0.000 -0.000 2 0.0000 - diagnostics::warn_trap 0.00 0.000 -0.000 2 0.0000 - diagnostics::splainthis 0.00 0.000 -0.000 2 0.0000 - diagnostics::transmo 0.00 0.000 -0.000 2 0.0000 - diagnostics::shorten 0.00 0.000 -0.000 2 0.0000 - diagnostics::autodescribe It's not easy to see what is responsible for this enormous overhead, even if main::BEGIN seems to be running most of the time. To get the full picture we must see the OPs tree, which shows us who calls whom, so we run: panic% dprofpp -T The output is: main::BEGIN diagnostics::BEGIN Exporter::import Exporter::export diagnostics::BEGIN Config::BEGIN Config::TIEHASH Exporter::import Exporter::export Config::FETCH Config::FETCH diagnostics::unescape ..................... 3159 times [diagnostics::unescape] snipped ..................... diagnostics::unescape diagnostics::import diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe main::test_code diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe So we see that 2 executions of diagnostics::BEGIN and 3,161 of diagnostics::unescape are responsible for most of the running overhead. If we comment out the diagnostics module, we get: Total Elapsed Time = 0.079974 Seconds User+System Time = 0.059974 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 0.00 0.000 -0.000 1 0.0000 - main::test_code It is possible to profile code running under mod_perl with the Devel::DProf module, available on CPAN. However, you must have PerlChildExitHandler enabled during the mod_perl build process. When the server is started, Devel::DProf installs an END block to write the tmon.out file. This block will be called at server shutdown. Here is how to start and stop a server with the profiler enabled: panic% setenv PERL5OPT -d:DProf panic% httpd -X -d `pwd` & ... make some requests to the server here ... panic% kill `cat logs/httpd.pid` panic% unsetenv PERL5OPT panic% dprofpp The Devel::DProf package is a Perl code profiler. It will collect information on the execution time of a Perl script and of the subroutines in that script (remember that print( ) and map( ) are just like any other subroutines you write, but they come bundled with Perl!). Another approach is to use Apache::DProf, which hooks Devel::DProf into mod_perl. The Apache::DProf module will run a Devel::DProf profiler inside the process and write the tmon.out file in the directory $ServerRoot/logs/dprof/$$ (make sure that it's writable by the server!) when the process is shut down (where $$ is the PID of the process). All it takes to activate this module is to modify httpd.conf. You can test for a command-line switch in httpd.conf. For example, to test if the server was started with -DPERLDPROF, use: <Location /perl> SetHandler perl-script PerlHandler Apache::Registry <IfDefine PERLDPROF> PerlModule Apache::DProf </IfDefine> </Location> And to activate profiling, use: panic% httpd -X -DPERLDPROF & Remember that any PerlHandler that was pulled in before Apache::DProf in the httpd.conf or startup.pl file will not have code-debugging information inserted. To run dprofpp, chdir to $ServerRoot/logs/dprof/$$[3] and run:
panic% dprofpp Use the command-line options for dropfpp(1) if a nondefault output is desired, as explained in the dropfpp manpage. You might especially want to look at the -r switch to display wallclock times (more relevant in a web-serving environment) and the -l switch to sort by number of subroutine calls. If you are running Perl 5.6.0 or higher, take a look at the new module Devel::Profiler (Version 0.04 as of this writing), which is supposed to be a drop-in replacement for Apache::DProf, with improved functionality and stability. 9.5.2 Profiling with Devel::SmallProfThe Devel::SmallProf profiler is focused on the time taken for a program run on a line-by-line basis. It is called "small" because it's supposed to impose very little extra load on the machine (speed- and memory-wise) while profiling the code. Let's take a look at the simple example shown in Example 9-9. Example 9-9. table_gen.plfor (1..1000) { my @rows = ( ); push @rows, Tr( map { td($_) } 'a'..'d' ); push @rows, Tr( map { td($_) } 'e'..'h' ); my $var = table(@rows); } sub table { my @rows = @_; return "<table>\n@rows</table>\n";} sub Tr { my @cells = @_; return "<tr>@cells</tr>\n"; } sub td { my $cell = shift; return "<td>$cell</td>"; } It creates the same HTML table in $var, with the cells of the table filled with single letters. The functions table( ), Tr( ), and td( ) insert the data into appropriate HTML tags. Notice that we have used Tr( ) and not tr( ), since the latter is a built-in Perl function, and we have used the same function name as in CGI.pm that does the same thing. If we print $var we will see something like this: <table> <tr><td>a</td> <td>b</td> <td>c</td> <td>d</td></tr> <tr><td>e</td> <td>f</td> <td>g</td> <td>h</td></tr> </table> We have looped a thousand times through the same code in order to get a more precise speed measurement. If the code runs very quickly we won't be able to get any meaningful results from just one loop. If we run this code with Devel::SmallProf: panic% perl -d:SmallProf table_gen.pl we get the following output in the autogenerated smallprof.out file: count wall tm cpu time line 1001 0.003855 0.030000 1: for (1..1000) { 1000 0.004823 0.040000 2: my @rows = ( ); 5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) } 5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) } 1000 0.067115 0.120000 5: my $var = table(@rows); 0 0.000000 0.000000 6: } 3000 0.033798 0.080000 7: sub table { my @rows = @_; return 6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return 24000 0.267353 0.490000 9: sub td { my $cell = shift; return 0 0.000000 0.000000 10: We can see in the CPU time column that Perl spends most of its time in the td( ) function; it's also the code that's visited by Perl the most times. In this example we could find this out ourselves without much effort, but if the code is longer it will be harder to find the lines of code where Perl spends most of its time. So we sort the output by the third column as a numerical value, in descending order: panic% sort -k 3nr,3 smallprof.out | less 24000 0.267353 0.490000 9: sub td { my $cell = shift; return 5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) } 5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) } 1000 0.067115 0.120000 5: my $var = table(@rows); 6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return 3000 0.033798 0.080000 7: sub table { my @rows = @_; return 1000 0.004823 0.040000 2: my @rows = ( ); 1001 0.003855 0.030000 1: for (1..1000) { According to the Devel::SmallProf manpage, the wallclock's measurements are fairly accurate (we suppose that they're correct on an unloaded machine), but CPU clock time is always more accurate. That's because if it takes more than one CPU time slice for a directive to complete, the time that some other process uses CPU is counted in the wallclock counts. Since the load on the same machine may vary greatly from moment to moment, it's possible that if we rerun the same test a few times we will get inconsistent results. Let's try to improve the td( ) function and at the same time the Tr( ) and table( ) functions. We will not copy the passed arguments, but we will use them directly in all three functions. Example 9-10 shows the new version of our script. Example 9-10. table_gen2.plfor (1..1000) { my @rows = ( ); push @rows, Tr( map { td($_) } 'a'..'d' ); push @rows, Tr( map { td($_) } 'e'..'h' ); my $var = table(@rows); } sub table { return "<table>\n@_</table>\n";} sub Tr { return "<tr>@_</tr>\n"; } sub td { return "<td>@_</td>"; } Now let's rerun the code with the profiler: panic% perl -d:SmallProf table_gen2.pl The results are much better now—only 0.34 CPU clocks are spent in td( ), versus 0.49 in the earlier run: panic% sort -k 3nr,3 smallprof.out | less 5000 0.279138 0.400000 4: push @rows, Tr( map { td($_) } 16000 0.241350 0.340000 9: sub td { return "<td>@_</td>"; } 5000 0.269940 0.320000 3: push @rows, Tr( map { td($_) } 4000 0.050050 0.130000 8: sub Tr { return "<tr>@_</tr>\n"; } 1000 0.065324 0.080000 5: my $var = table(@rows); 1000 0.006650 0.010000 2: my @rows = ( ); 2000 0.020314 0.030000 7: sub table{ return "<table>\n@_</table>\n";} 1001 0.006165 0.030000 1: for (1..1000) { You will also notice that Devel::SmallProf reports that the functions were executed different numbers of times in the two runs. That's because in our original example all three functions had two statements on each line, but in the improved version they each had only one. Devel::SmallProf looks at the code after it's been parsed and optimized by Perl—thus, if some optimizations took place, it might not be exactly the same as the code that you wrote. In most cases you will probably find Devel::DProf more useful than Devel::SmallProf, as it allows you to analyze the code by subroutine and not by line. Just as there is the Apache::DProf equivalent for Devel::DProf, there is the Apache::SmallProf equivalent for Devel::SmallProf. It uses a configuration similar to Apache::DProf—i.e., it is registered as a PerlFixupHandler—but it also requires Apache::DB. Therefore, to use it you should add the following configuration to httpd.conf: <Perl> if (Apache->define('PERLSMALLPROF')) { require Apache::DB; Apache::DB->init; } </Perl> <Location /perl> SetHandler perl-script PerlHandler Apache::Registry <IfDefine PERLSMALLPROF> PerlFixupHandler Apache::SmallProf </IfDefine> </Location> Now start the server: panic% httpd -X -DPERLSMALLPROF & This will activate Apache::SmallProf::handler during the request. As a result, the profile files will be written to the $ServerRoot/logs/smallprof/ directory. Unlike with Devel::SmallProf, the profile is split into several files based on package name. For example, if CGI.pm was used, one of the generated profile files will be called CGI.pm.prof. |
[ Team LiB ] |