16.4 HTTP Requests
Section 13.11 of the specification states that the only two cacheable
methods are GET and HEAD.
Responses to POST requests are not cacheable, as
you'll see in a moment.
16.4.1 GET Requests
Most mod_perl programs are written to service GET
requests. The server passes the request to the mod_perl code, which
composes and sends back the headers and the content body.
But there is a certain situation that needs a workaround to achieve
better cacheability. We need to deal with the "?"
in the relative path part of the requested URI. Section 13.9
specifies that:
... caches MUST NOT treat responses to such URIs as fresh unless the server provides
an explicit expiration time. This specifically means that responses from HTTP/1.0
servers for such URIs SHOULD NOT be taken from a cache.
Although it is tempting to imagine that if we are using HTTP/1.1 and
send an explicit expiration time we are safe, the reality is
unfortunately somewhat different. It has been common for quite a long
time to misconfigure cache servers so that they treat all
GET requests containing a question mark as
uncacheable. People even used to mark anything that contained the
string "cgi-bin" as uncacheable.
To work around this bug in HEAD requests, we have
stopped calling CGI directories cgi-bin and we
have written the following handler, which lets us work with CGI-like
query strings without rewriting the software (e.g.,
Apache::Request and CGI.pm)
that deals with them:
sub handler {
my $r = shift;
my $uri = $r->uri;
if ( my($u1,$u2) = $uri =~ / ^ ([^?]+?) ; ([^?]*) $ /x ) {
$r->uri($u1);
$r->args($u2);
}
elsif ( my ($u1,$u2) = $uri =~ m/^(.*?)%3[Bb](.*)$/ ) {
# protect against old proxies that escape volens nolens
# (see HTTP standard section 5.1.2)
$r->uri($u1);
$u2 =~ s/%3[Bb]/;/g;
$u2 =~ s/%26/;/g; # &
$u2 =~ s/%3[Dd]/=/g;
$r->args($u2);
}
DECLINED;
}
This handler must be installed as a
PerlPostReadRequestHandler.
The handler takes any request that contains one or more semicolons
but no question mark and changes it so that the
first semicolon is interpreted as a question mark and everything
after that as the query string. So now we can replace the request:
http://example.com/query?BGCOLOR=blue;FGCOLOR=red
with:
http://example.com/query;BGCOLOR=blue;FGCOLOR=red
This allows the coexistence of queries from ordinary forms that are
being processed by a browser alongside predefined requests for the
same resource. It has one minor bug: Apache doesn't
allow percent-escaped slashes in such a query string. So instead of:
http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=%2Ffont%2Fpath
we must use:
http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=/font/path
To unescape the escaped characters, use the following code:
s/%([0-9A-Fa-f]{2})/chr hex $1/ge;
16.4.2 Conditional GET Requests
A rather challenging request that may be received is the conditional
GET, which typically means a request with an
If-Modified-Since header. The HTTP specification
has this to say:
The semantics of the GET method change to a "conditional GET" if the request message
includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-
Range header field. A conditional GET method requests that the entity be transferred
only under the circumstances described by the conditional header field(s). The
conditional GET method is intended to reduce unnecessary network usage by allowing
cached entities to be refreshed without requiring multiple requests or transferring
data already held by the client.
So how can we reduce the unnecessary network usage in such a case?
mod_perl makes it easy by providing access to
Apache's meets_conditions( )
function (which lives in Apache::File). The
Last-Modified (and possibly
ETag) headers must be set up before calling this
method. If the return value of this method is anything other than
OK, then this value is the one that should be
returned from the handler when we have finished. Apache handles the
rest for us. For example:
if ((my $result = $r->meets_conditions) != OK) {
return $result;
}
#else ... go and send the response body ...
If we have a Squid accelerator running, it will often handle the
conditionals for us, and we can enjoy its extremely fast responses
for such requests by reading the access.log
file. Just grep for
TCP_IMS_HIT/304. However, there are circumstances
under which Squid may not be allowed to use its cache. That is why
the origin server (which is the server we are programming) needs to
handle conditional GETs as well, even if a Squid
accelerator is running.
16.4.3 HEAD Requests
Among the headers described thus far, the date-related ones
(Date, Last-Modified, and
Expires/Cache-Control) are
usually easy to produce and thus should be computed for
HEAD requests just the same as for
GET requests.
The Content-Type and
Content-Length headers should be exactly the same
as would be supplied to the corresponding GET
request. But since it may be expensive to compute them, they can
easily be omitted, since there is nothing in the specification that
requires them to be sent.
What is important is that the response to a HEAD
request must not contain a message-body. The
code in a mod_perl handler might look like this:
# compute the headers that are easy to compute
# currently equivalent to $r->method eq "HEAD"
if ( $r->header_only ) {
$r->send_http_header;
return OK;
}
If a Squid accelerator is being used, it will be able to handle the
whole HEAD request by itself, but under some
circumstances it may not be allowed to do so.
16.4.4 POST Requests
The response to a POST request is
not cacheable, due to an
underspecification in the HTTP standards. Section 13.4 does not
forbid caching of responses to POST requests, but
no other part of the HTTP standard explains how the caching of
POST requests could be implemented, so we are in a
vacuum. No existing caching servers implement the caching of
POST requests (although some browsers with more
aggressive caching implement their own caching of POST requests).
However, this may change if someone does the groundwork of defining
the semantics for cache operations on POST
requests.
Note that if a Squid accelerator is being used, you should be aware
that it accelerates outgoing traffic but does not bundle incoming
traffic. Squid is of no benefit at all on POST
requests, which could be a problem if the site receives a lot of long
POST requests. Using GET
instead of POST means that requests can be cached,
so the possibility of using GETs should always be
considered. However, unlike with POSTs, there are
size limits and visibility issues that apply to
GETs, so they may not be suitable in every case.
|