[ Team LiB ] Previous Section Next Section

11.6 Reducing the Number of stat( ) Calls Made by Apache

If (using truss, strace, or another tool available for your OS) you watch the system calls that your mod_perl server makes while processing a request, you will notice that a few stat( ) calls are made, and these are quite expensive. For example, if you have your DocumentRoot set to /home/httpd/docs and you fetch http://localhost/perl-status, you will see:

[snip]
stat("/home/httpd/docs/perl-status", 0xbffff8cc) = -1 
                    ENOENT (No such file or directory)
stat("/home/httpd/docs", {st_mode=S_IFDIR|0755, 
                               st_size=1024, ...}) = 0
[snip]

If you have some dynamic content and your virtual relative URI is looks like /news/perl/mod_perl/summary (i.e., there is no such directory on the web server—the path components are used only for requesting a specific report), this will generate five stat( ) calls before the DocumentRoot is reached and the search is stopped. You will see something like this:

stat("/home/httpd/docs/news/perl/mod_perl/summary", 0xbffff744) = -1 
                    ENOENT (No such file or directory)
stat("/home/httpd/docs/news/perl/mod_perl",         0xbffff744) = -1
                    ENOENT (No such file or directory)
stat("/home/httpd/docs/news/perl",                  0xbffff744) = -1
                    ENOENT (No such file or directory)
stat("/home/httpd/docs/news",                       0xbffff744) = -1
                    ENOENT (No such file or directory)
stat("/home/httpd/docs", 
                    {st_mode=S_IFDIR|0755, st_size=1024, ...})  =  0

How expensive are these calls? Let's use the Time::HiRes module to find out.

The script in Example 11-3, which you should run on the command line, takes a time sample at the beginning, then does a million stat( ) calls to a nonexistent file, samples the time at the end, and prints the average time it took to make a single stat( ) call.

Example 11-3. stat_call_sample.pl
use Time::HiRes qw(gettimeofday tv_interval);
my $calls = 1_000_000;

my $start_time = [ gettimeofday ];

stat "/foo" for 1..$calls;

my $end_time = [ gettimeofday ];
my $avg = tv_interval($start_time,$end_time) / $calls;
print "The average execution time: $avg seconds\n";

Before we actually run the script we should distinguish between two different scenarios. When the server is idle, the time between the first and the last system call will be much shorter than the same time measured on a loaded system. This is because on an idle system, a process can use the CPU very often, whereas on a loaded system, lots of processes compete for CPU time and each process has to wait longer to get the same amount of CPU time.

So first we run the above code on an unloaded system:

panic% perl stat_call_sample.pl
The average execution time: 4.209645e-06 seconds

Here it takes about four microseconds to execute a stat( ) call. Now we'll start a CPU-intensive process in one console (make sure to kill the process afterward!). The following code keeps the CPU busy all the time:

panic% perl -e '1 while 1'

And now we run the stat_call_sample.pl script in another console:

panic% perl stat_call_sample.pl
The average execution time: 8.777301e-06 seconds

You can see that the average time has doubled (about eight microseconds). This is intuitive, since there were two processes competing for CPU resources. Now if we run four occurrences of the above code:

panic% perl -e '1**1 while 1' &
panic% perl -e '1**1 while 1' &
panic% perl -e '1**1 while 1' &
panic% perl -e '1**1 while 1' &

and run our script in parallel with these processes, we get:

panic% perl stat_call_sample.pl
2.0853558e-05 seconds

So the average stat( ) system call is five times longer now (about 20 microseconds). Now if you have 50 mod_perl processes that keep the CPU busy all the time, the stat( ) call will be 50 times slower and it'll take 0.2 milliseconds to complete a series of calls. If you have five redundant calls, as in the strace example above, they add up to one millisecond. If you have more processes constantly consuming CPU resources, this time adds up. Now multiply this time by the number of processes that you have and you get a few seconds lost. For some services this loss is insignificant, while for others it could be very significant.

So why does Apache do all these redundant stat( ) calls? The reason is the default installed TransHandler. One solution would be to supply our own, which would be smart enough not to look for this virtual path and would immediately return OK. In cases where you have a virtual host that serves only dynamically generated documents, you can override the default PerlTransHandler with the following one:

PerlModule Apache::Constants
<VirtualHost 10.10.10.10:80>
    ...
    PerlTransHandler Apache::Constants::OK
    ...
</VirtualHost>

The Apache::Constants::OK constant (which is actually a subroutine) is used here as a handler that does nothing but finish the translation phase by returning OK. By skipping the default translation handler, which tries to find a filesystem component that matches the given URI, you save the redundant stat( ) calls!

As you see, it affects only this specific virtual host. Remember that PerlTransHandler cannot appear inside a specific <Location> or similar section, because the request has not yet been associated with a particular file or directory.

As we will show next, Apache's default TransHandler may perform several stat( ) calls when the request is served by a virtual resource that doesn't reside on the filesystem. Things get worse when Apache is configured to look for .htaccess files, adding many redundant open( ) calls.

Let's start with the following simple configuration and try to reduce the number of redundant system calls to a minimum:

DocumentRoot "/home/httpd/docs"
<Directory />
    AllowOverride All
</Directory>
<Location /foo/test>
    SetHandler perl-script
    PerlHandler Apache::Foo
</Location>

The above configuration causes the Perl handler( ) defined in Apache::Foo to be executed when we make a request to /foo/test. Notice that in the test setup there is no real file to be executed and no .htaccess file.

Using the above configuration, the system calls trace may look as follows:

stat("/home/httpd/docs/foo/test", 0xbffff8fc) = -1 ENOENT
  (No such file or directory)
stat("/home/httpd/docs/foo",      0xbffff8fc) = -1 ENOENT
  (No such file or directory)
stat("/home/httpd/docs",
  {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
open("/.htaccess", O_RDONLY)                 = -1 ENOENT
  (No such file or directory)
open("/home/.htaccess", O_RDONLY)            = -1 ENOENT
  (No such file or directory)
open("/home/httpd/.htaccess", O_RDONLY)      = -1 ENOENT
  (No such file or directory)
open("/home/httpd/docs/.htaccess", O_RDONLY) = -1 ENOENT
  (No such file or directory)
stat("/home/httpd/docs/test", 0xbffff774)    = -1 ENOENT
  (No such file or directory)
stat("/home/httpd/docs", 
  {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0

Now we modify the <Directory> entry and add AllowOverride None, which, among other things, tells Apache not to search for .htaccess files:

<Directory />
    AllowOverride None
</Directory>

After restarting the server and issuing a request to /foo/test, we see that the four open( ) calls for .htaccess have gone. The remaining system calls are:

stat("/home/httpd/docs/foo/test", 0xbffff8fc) = -1 ENOENT 
  (No such file or directory)
stat("/home/httpd/docs/foo",      0xbffff8fc) = -1 ENOENT 
  (No such file or directory)
stat("/home/httpd/docs", 
  {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
stat("/home/httpd/docs/test", 0xbffff774)    = -1 ENOENT 
  (No such file or directory)
stat("/home/httpd/docs", 
  {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0

Next, let's try to shortcut the foo location with:

Alias /foo/ /

which makes Apache look for the file in the / directory and not under /home/httpd/docs/foo. Let's restart the server and try again:

stat("/test", 0xbffff8fc) = -1 ENOENT (No such file or directory)

Now we've got only one stat( ) call left!

Let's replace the Alias setting we have just added with:

PerlModule Apache::Constants
PerlTransHandler Apache::Constants::OK

as explained earlier. When we issue the request, we see no remaining stat( ) calls. This technique works if you serve content using only mod_perl handlers, since CGI scripts and other files won't be looked for on the filesystem now. Also, since the default translation handler is now skipped, $r->filename now won't be set.

If you want to serve both mod_perl handlers and real files, you will have to write your own PerlTransHandler to handle requests as desired. For example, the following PerlTransHandler will not look up the file on the filesystem if the URI starts with /foo—the handler will return DECLINED and the default PerlTransHandler will be used:

PerlTransHandler 'sub { return shift->uri( ) =~ m|^/foo| \
                      ? Apache::Constants::OK \
                      : Apache::Constants::DECLINED; }'

Let's see the same configuration using a <Perl> section and a dedicated package (see Example 11-4).

Example 11-4. perl_section.conf
<Perl>
  package Book::Trans;
  use Apache::Constants qw(:common);
  sub handler {
      my $r = shift;
      return OK if $r->uri( ) =~ m|^/foo|;
      return DECLINED;
  }

  package Apache::ReadConfig;  
  $PerlTransHandler = "Book::Trans";
</Perl>

Here we have defined the Book::Trans package and implemented the handler( ) function. Then we have assigned this handler to the PerlTransHandler.

You can move the code in the module into an external file (e.g., Book/Trans.pm) and configure the PerlTransHandler with:

PerlTransHandler Book::Trans

in the normal way (no <Perl> section required).

Now we'll run some benchmarks to test the solutions described above, both individually and in groups. To make the difference in the number of stat( ) calls more prominent, we will use a very light handler that just prints something out.

The module that we have used is shown in Example 11-5.

Example 11-5. Book/News.pm
package Book::News;
use Apache::Constants qw(:common);
sub handler {
    my $r = shift;
    my $uri = $r->uri;
    my @sections = split "/", $uri;
    # in a real handler you'd do some DB lookup and return the story:
    # my $story = get_story(@sections);
    $r->send_http_header('text/plain');
    print "Story matching @sections\n";
    return OK;
}
1;

This is the URI we have used for testing:

/news/perl/mod_perl/summary

Notice that the URI is long enough to generate many stat( ) calls with the default Apache configuration.

This is the main configuration:

<Location /news>
    SetHandler perl-script
    PerlHandler +Book::News
</Location>

Now we try different configurations and see how they influence performance. Each configuration is listed with a tag in parentheses that is used as a key in the table and explanation that follows.

  1. (default) Nothing was added:

    <Directory />
        AllowOverride All
    </Directory>
  2. (noht) Prevent .htaccess lookup:

    <Directory />
        AllowOverride None
    </Directory>
  3. (alias) Location alias shortcutting:

    Alias /news /
  4. (trans) Using a nondefault TransHandler:

    <Perl>
        package Book::Trans;
        use Apache::Constants qw(:common);
        sub handler {
            my $r = shift;
            return OK if $r->uri( ) =~ m|^/news|;
            return DECLINED;
        }
    
        package Apache::ReadConfig;
        $PerlTransHandler = "Book::Trans";
    </Perl>

The results, sorted by the requests per second (rps) rate, are:

Options    | avtime completed failed    rps
-----------|-------------------------------
noht+alias |     27      5000      0    996
noht+trans |     29      5000      0    988
trans      |     29      5000      0    975
alias      |     28      5000      0    974
noht       |     32      5000      0    885
default    |     34      5000      0    827

with static arguments:

Concurrency         : 30
Number of requests  : 5000

The concurrency and connections don't matter here; we are concerned with the relative rather than the absolute numbers.

Figure 11-2 depicts these results.

Figure 11-2. Results of the four solutions
figs/pmp_1102.gif

Preventing .htaccess lookup (noht) improved the performance by about 8% (885 versus 827). Using alias shortcutting (alias) or a nondefault TransHandler (trans) gave even more of a performance boost: since for a long URI like the one in our example, each directory generates a few stat( ) and open( ) system calls, the speedup was around 15% compared to the standard configuration (default). Grouping the prevention of .htaccess lookup (noht) plus one of the techniques that don't look for the nonexistent file in the filesystem (alias or trans) gave a performance boost of about 18% (996 versus 827).

As we have seen, the number of pseudo-subdirectories is in direct proportion to the number of stat( ) and open( ) system calls that are made. To prove this, let's use the standard configuration (default) and benchmark three URIs with a different number of sections (directories), without counting the first section (/news):

Sections  URI
---------------------------------------------
1         /news/perl
3         /news/perl/mod_perl/summary
5         /news/perl/mod_perl/summary/foo/bar

The results are what we expected:

Sections  | avtime completed failed    rps 
---------------------------------------------------
       1  |     33      5000      0    849
       3  |     34      5000      0    829
       5  |     35      5000      0    801
---------------------------------------------------

Each of the two sections add an extra millisecond to the average processing and connection time, which reduces performance by about 25 requests per second.

It's important to read the figures cautiously. Improving performance by 20% simply by adding a few configuration directives is not likely to be achieved in practice. In our test code we used a very light handler, which did nothing but send a few lines of text without doing any processing. When you use real code, whose runtime is not 30-40 milliseconds but 300-400 milliseconds, the improvement of 7 milliseconds on average (as we saw between the standard configuration (default), giving 34 ms, and the combination of noht and alias, giving 27 ms) might be insignificant. The tuning we've discussed here is important mostly for servers that serve millions of requests per day and where every millisecond counts.

But even if your server has a light load, you can still make it a little bit faster. Use a benchmark on the real code and see whether you win something or not.

    [ Team LiB ] Previous Section Next Section