16.2 Content Headers
The following sections describe the HTTP headers that specify the
type and length of the content, and the version of the content being
sent. Note that in this section we often use the term
message. This term is used to describe the data
that comprises the HTTP headers along with their associated content;
the content is the actual page, image, file, etc.
16.2.1 Content-Type Header
Most CGI programmers are familiar with
Content-Type. Sections 3.7, 7.2.1, and 14.17 of
the HTTP specification cover the details. mod_perl has a
content_type( ) method to
deal with this header:
$r->content_type("image/png");
Content-Type should be
included in every set of headers, according to the standard, and
Apache will generate one if your code doesn't. It
will be whatever is specified in the relevant
DefaultType configuration directive, or
text/plain if none is active.
16.2.2 Content-Length Header
According to section 14.13 of the HTTP specification, the
Content-Length header is the number of octets
(8-bit bytes) in the body of a message. If the length can be
determined prior to sending, it can be very useful to include it. The
most important reason is that KeepAlive requests
(when the same connection is used to fetch more than one object from
the web server) work only with responses that contain a
Content-Length header. In mod_perl we can write:
$r->header_out('Content-Length', $length);
When using Apache::File, the
additional set_content_length(
) method, which is slightly more efficient than the above,
becomes available to the Apache class. In this case we can write:
$r->set_content_length($length);
The Content-Length header can have a significant
impact on caches by invalidating cache entries, as the following
extract from the specification explains:
The response to a HEAD request MAY be cacheable in the sense that the information
contained in the response MAY be used to update a previously cached entity from that
resource. If the new field values indicate that the cached entity differs from the
current entity (as would be indicated by a change in Content-Length, Content-MD5,
ETag or Last-Modified), then the cache MUST treat the cache entry as stale.
It is important not to send an erroneous
Content-Length header in a response to either a
GET or a HEAD request.
16.2.3 Entity Tags
An entity tag (ETag) is a validator
that can be used instead of, or in addition
to, the Last-Modified header; it is a quoted
string that can be used to identify different versions of a
particular resource. An entity tag can be added to the response
headers like this:
$r->header_out("ETag","\"$VERSION\"");
mod_perl offers the $r->set_etag( ) method if
we have use( )ed Apache::File.
However, we strongly recommend that you don't use
the set_etag( ) method! set_etag(
) is meant to be used in conjunction with a static request
for a file on disk that has been stat( )ed in the
course of the current request. It is inappropriate and dangerous to
use it for dynamic content.
By sending an entity tag we are promising the recipient that we will
not send the same ETag for the same resource again
unless the content is "equal" to
what we are sending now.
The pros and cons of using entity tags are discussed in section 13.3
of the HTTP specification. For mod_perl programmers, that discussion
can be summed up as follows.
There are strong and weak validators. Strong validators change
whenever a single bit changes in the response; i.e., when anything
changes, even if the meaning is unchanged. Weak validators change
only when the meaning of the response changes. Strong validators are
needed for caches to allow for sub-range requests. Weak validators
allow more efficient caching of equivalent objects. Algorithms such
as MD5 or SHA are good strong validators, but what is usually
required when we want to take advantage of caching is a good weak
validator.
It is possible in web clients to interrupt the connection before the
data transfer has finished. As a result, the client may have partial
documents or images loaded into its memory. If the page is reentered
later, it is useful to be able to request the server to return just
the missing portion of the document, instead of retransferring the
entire file.
There are also a number of web applications that benefit from being
able to request the server to give a byte range of a document. As an
example, a PDF viewer would need to be able to access individual
pages by byte range—the table that defines those ranges is
located at the end of the PDF file.
In practice, most of the data on the Web is represented as a byte
stream and can be addressed with a byte range to retrieve a desired
portion of it.
For such an exchange to happen, the server needs to let the client
know that it can support byte ranges, which it does by sending the
Accept-Ranges header:
Accept-Ranges: bytes
The server will send this header only for documents for which it will
be able to satisfy the byte-range request—e.g., for PDF
documents or images that are only partially cached and can be
partially reloaded if the user interrupts the page load.
The client requests a byte range using the Range
header:
Range: bytes=0-500,5000-
Because of the architecture of the byte-range request and response,
the client is not limited to attempting to use byte ranges only when
this header is present. If a server does not support the
Range header, it will simply ignore it and send
the entire document as a response.
|
A Last-Modified time, when used as a validator in
a request, can be strong or weak, depending on a couple of rules
described in section 13.3.3 of the HTTP standard. This is mostly
relevant for range requests, as this quote from section 14.27
explains:
If the client has no entity tag for an entity, but does have a Last-Modified date, it
MAY use that date in an If-Range header.
But it is not limited to range requests. As section 13.3.1 states,
the value of the Last-Modified header can also be
used as a cache validator.
The fact that a Last-Modified date may be used as
a strong validator can be pretty disturbing if we are in fact
changing our output slightly without changing its semantics. To
prevent this kind of misunderstanding between us and the cache
servers in the response chain, we can send a weak validator in an
ETag header. This is possible because the
specification states:
If a client wishes to perform a sub-range retrieval on a value for which it has only
a Last-Modified time and no opaque validator, it MAY do this only if the Last-
Modified time is strong in the sense described here.
In other words, by sending an ETag that is marked
as weak, we prevent the cache server from using the
Last-Modified header as a strong validator.
An ETag value is marked as a weak validator by
prepending the string W/ to the quoted string;
otherwise, it is strong. In Perl this would mean something like this:
$r->header_out('ETag',"W/\"$VERSION\"");
Consider carefully which string is chosen to act as a validator. We
are on our own with this decision:
... only the service author knows the semantics of a resource well enough to select
an appropriate cache validation mechanism, and the specification of any validator
comparison function more complex than byte-equality would open up a can of worms.
Thus, comparisons of any other headers (except Last-Modified, for compatibility with
HTTP/1.0) are never used for purposes of validating a cache entry.
If we are composing a message from multiple components, it may be
necessary to combine some kind of version information for all these
components into a single string.
If we are producing relatively large documents, or content that does
not change frequently, then a strong entity tag will probably be
preferred, since this will give caches a chance to transfer the
document in chunks.
|