[ Team LiB ] Previous Section Next Section

1.4 Apache 1.3 Request Processing Phases

To understand mod_perl, you should understand how request processing works within Apache. When Apache receives a request, it processes it in 11 phases. For every phase, a standard default handler is supplied by Apache. You can also write your own Perl handlers for each phase; they will override or extend the default behavior. The 11 phases (illustrated in Figure 1-4) are:

Figure 1-4. Apache 1.3 request processing phases
figs/pmp_0104.gif
Post-read-request

This phase occurs when the server has read all the incoming request's data and parsed the HTTP header. Usually, this stage is used to perform something that should be done once per request, as early as possible. Modules' authors usually use this phase to initialize per-request data to be used in subsequent phases.

URI translation

In this phase, the requested URI is translated to the name of a physical file or the name of a virtual document that will be created on the fly. Apache performs the translation based on configuration directives such as ScriptAlias. This translation can be completely modified by modules such as mod_rewrite, which register themselves with Apache to be invoked in this phase of the request processing.

Header parsing

During this phase, you can examine and modify the request headers and take a special action if needed—e.g., blocking unwanted agents as early as possible.

Access control

This phase allows the server owner to restrict access to specific resources based on various rules, such as the client's IP address or the day of week.

Authentication

Sometimes you want to make sure that a user really is who he claims to be. To verify his identity, challenge him with a question that only he can answer. Generally, the question is a login name and password, but it can be any other challenge that allows you to distinguish between users.

Authorization

The service might have various restricted areas, and you might want to allow the user to access some of these areas. Once a user has passed the authentication process, it is easy to check whether a specific location can be accessed by that user.

MIME type checking

Apache handles requests for different types of files in different ways. For static HTML files, the content is simply sent directly to the client from the filesystem. For CGI scripts, the processing is done by mod_cgi, while for mod_perl programs, the processing is done by mod_perl and the appropriate Perl handler. During this phase, Apache actually decides on which method to use, basing its choice on various things such as configuration directives, the filename's extension, or an analysis of its content. When the choice has been made, Apache selects the appropriate content handler, which will be used in the next phase.

Fixup

This phase is provided to allow last-minute adjustments to the environment and the request record before the actual work in the content handler starts.

Response

This is the phase where most of the work happens. First, the handler that generates the response (a content handler) sends a set of HTTP headers to the client. These headers include the Content-type header, which is either picked by the MIME-type-checking phase or provided dynamically by a program. Then the actual content is generated and sent to the client. The content generation might entail reading a simple file (in the case of static files) or performing a complex database query and HTML-ifying the results (in the case of the dynamic content that mod_perl handlers provide).

This is where mod_cgi, Apache::Registry, and other content handlers run.

Logging

By default, a single line describing every request is logged into a flat file. Using the configuration directives, you can specify which bits of information should be logged and where. This phase lets you hook custom logging handlers—for example, logging into a relational database or sending log information to a dedicated master machine that collects the logs from many different hosts.

Cleanup

At the end of each request, the modules that participated in one or more previous phases are allowed to perform various cleanups, such as ensuring that the resources that were locked but not freed are released (e.g., a process aborted by a user who pressed the Stop button), deleting temporary files, and so on.

Each module registers its cleanup code, either in its source code or as a separate configuration entry.

At almost every phase, if there is an error and the request is aborted, Apache returns an error code to the client using the default error handler (or a custom one, if provided).

1.4.1 Apache 1.3 Modules and the mod_perl 1.0 API

The advantage of breaking up the request process into phases is that Apache gives a programmer the opportunity to "hook" into the process at any of those phases. Apache has been designed with modularity in mind. A small set of core functions handle the basic tasks of dealing with the HTTP protocol and managing child processes. Everything else is handled by modules. The core supplies an easy way to plug modules into Apache at build time or runtime and enable them at runtime.

Modules for the most common tasks, such as serving directory indexes or logging requests, are supplied and compiled in by default. mod_cgi is one such module. Other modules are bundled with the Apache distribution but are not compiled in by default: this is the case with more specialized modules such as mod_rewrite or mod_proxy. There are also a vast number of third-party modules, such as mod_perl, that can handle a wide variety of tasks. Many of these can be found in the Apache Module Registry (http://modules.apache.org/).

Modules take control of request processing at each of the phases through a set of well-defined hooks provided by Apache. The subroutine or function in charge of a particular request phase is called a handler. These include authentication handlers such as mod_auth_dbi, as well as content handlers such as mod_cgi. Some modules, such as mod_rewrite, install handlers for more than one request phase.

Apache also provides modules with a comprehensive set of functions they can call to achieve common tasks, including file I/O, sending HTTP headers, or parsing URIs. These functions are collectively known as the Apache Application Programming Interface (API).

Apache is written in C and currently requires that modules be written in the same language. However, as we will see, mod_perl provides the full Apache API in Perl, so modules can be written in Perl as well, although mod_perl must be installed for them to run.

1.4.2 mod_perl 1.0 and the mod_perl API

Like other Apache modules, mod_perl is written in C, registers handlers for request phases, and uses the Apache API. However, mod_perl doesn't directly process requests. Rather, it allows you to write handlers in Perl. When the Apache core yields control to mod_perl through one of its registered handlers, mod_perl dispatches processing to one of the registered Perl handlers.

Since Perl handlers need to perform the same basic tasks as their C counterparts, mod_perl exposes the Apache API through a mod_perl API, which is a set of Perl functions and objects. When a Perl handler calls such a function or method, mod_perl translates it into the appropriate Apache C function.

Perl handlers extract the last drop of performance from the Apache server. Unlike mod_cgi and Apache::Registry, they are not restricted to the content generation phase and can be tied to any phase in the request loop. You can create your own custom authentication by writing a PerlAuthenHandler, or you can write specialized logging code in a PerlLogHandler.

Handlers are not compatible with the CGI specification. Instead, they use the mod_perl API directly for every aspect of request processing.

mod_perl provides access to the Apache API for Perl handlers via an extensive collection of methods and variables exported by the Apache core. This includes methods for dealing with the request (such as retrieving headers or posted content), setting up the response (such as sending HTTP headers and providing access to configuration information derived from the server's configuration file), and a slew of other methods providing access to most of Apache's rich feature set.

Using the mod_perl API is not limited to mod_perl handlers. Apache::Registry scripts can also call API methods, at the price of forgoing CGI compatibility.

We suggest that you refer to the book Writing Apache Modules with Perl and C, by Lincoln Stein and Doug MacEachern (O'Reilly), if you want to learn more about API methods.

    [ Team LiB ] Previous Section Next Section