2.2 The HTTP Request/Response Phase

To better illustrate how the web server and servlet container work together to service clients, this section discusses the protocol for an HTTP request and response, from the time a client request is received until the server returns a response. Struts makes heavy use of the request and response objects, and a complete understanding of the round-trip process will help clarify some topics discussed later in the book.

Although the browser is not the only type of client that can be used with Struts, it certainly is the most common. More and more developers are starting to use Struts for wireless applications and even some interaction with web services, but the web browser remains the predominant client.

HTTP is based on a request/response model, so there are two types of HTTP messages: the request and the response. The browser opens a connection to a server and makes a request. The server processes the client's request and returns a response. Figure 2-3 illustrates this process.

Figure 2-3. The HTTP request/response model

figs/jstr_0203.gif

Both types of messages consist of a start line, zero or more header fields, and an empty line that indicates the end of the message headers. Both message types also may contain an optional message body.

The format and makeup of the request and response messages are very similar, but there are a few differences. We'll discuss each type of message separately.

2.2.1 The HTTP Request

The start line of an HTTP request is known as the request line. It's always the first line of the request message, and it contains three separate fields:

· An HTTP method

· A universal resource identifier (URI)

· An HTTP protocol version

Although there are several HTTP methods for retrieving data from a server, the two used most often are GET and POST . The GET method requests from the server the resource, indicated by the request URI. If the URI points to a data-producing resource such as a servlet, the data will be returned within the response message. Although the GET message can pass information in the query string, the POST method is used to explicitly pass to the server data that can be used for processing by the request URI.

The URI identifies the resource that should process the request. For the purposes of this discussion, it can be either an absolute or a relative path. A request with an invalid URI will return an error code (typically 404).

The HTTP request protocol version identifies to the server which version of the HTTP specification the request conforms to. The following example illustrates the request line for a sample GET request:

GET /index.html HTTP/1.0

You can execute this example by opening up a Telnet session to a server running a web server. You must specify the hostname and port number of the web server. For example:

telnet localhost 80

Then type the GET command. You will need to press Enter twice after issuing the command: once for the end of the request line and again to let the server know you are finished with the request. Assuming there's a file called index.html in the root directory, the HTML response will be returned. (Actually, you will always see a response—it just may not be the one that you expected.) We'll talk more about using Telnet to interact with a web server when we discuss redirects and forwards later in this chapter.

As mentioned earlier, the HTTP request may contain zero or more header fields. Request header fields allow the client to pass to the server additional information about the request and the client itself. The format of a header field, for both the request and the response, is the name of the header field, followed by a colon (:) and the value. If multiple values are specified for a single header field, they must be separated with commas. Table 2-1 lists some of the more commonly used request headers.

Table 2-1. Common HTTP request header fields
Name	Purpose
`Accept`	Indicates the media types that are acceptable for the response. If no `Accept` header field is present, the server can safely assume that the client accepts all media types. An example of an `Accept` header value is "image/gif, image/jpeg".
`Accept-Charset`	Indicates what character sets are acceptable for the response. If the `Accept-Charset` header is not present in the request, the server can assume that any character set is acceptable. The ISO-8859-1 character set can be assumed to be acceptable by all user agents.
`Accept-Encoding`	Similar to the `Accept` header field, but further restricts the content-encoding values that are acceptable by the client. An example of an `Accept-Encoding` header value is "compress, gzip".
`Accept-Language`	Indicates which languages the client would prefer the response to be in. An example of an `Accept-Language` header value is "en-us, de-li, es-us".
`Content-Encoding`	Indicates what encoding mechanism has been applied to the body of the message and, therefore, what decoding mechanism must be used to get the information. An example of a `Content-Encoding` header value is "gzip".
`Content-Type`	Indicates the media type of the body sent to the recipient. An example of a `Content-Type` header value is "text/html; charset=ISO-8859-1".
`Host`	Indicates the hostname and port number of the resource being requested, as obtained from the original URL. An example of a `Host` header value is "www.somehost.com".
`Referer`	Allows the client to specify the address (URI) of the resource from which the request URI was obtained. This header is used mainly for maintenance and tracking purposes.
`User-Agent`	Contains information about the client that originated the request. This header is used mainly for statistical purposes and tracing of protocol violations. An example of a `User-Agent` header value is "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)".

The message body for a request is used to carry to the server data associated with the request. The data included within the body is different from the values used by the header fields in terms of both format and content. The header fields can be thought of as metadata about the message body.

2.2.2 The HTTP Response

Once the server has received and processed the request, it must return an HTTP response message to the client. The response message consists of a status line and zero or more header fields, followed by an empty line. It also may have an optional message body.

The first line of the HTTP response message is known as the status line. It consists of the HTTP protocol version that the response conforms to, followed by a numeric status code and its textual explanation. Each field is separated from the next by a space. An example response status line is shown here:

HTTP/1.1 200 OK

The status code is a three-digit numeric value that corresponds to the result code of the server's attempt to satisfy the request. The status code is for programmatic applications, while the text that accompanies it is intended for human readers. The first digit of the status code defines the category of the result code. Table 2-2 lists the allowed first digits and the corresponding categories.

Table 2-2. Status code categories
Numeric Value	Meaning
100-199	Informational—The request was received and is being processed.
200-299	Success—The action was successfully received, understood, and accepted.
300-399	Redirection—Further action must be taken to complete the request.
400-499	Client Error—The request contains bad syntax or cannot be fulfilled.
500-599	Server Error—The server failed to fulfill an apparently valid request.

Quite a few status codes have been defined. They also are extensible, which allows applications to extend the behavior of the server. If a client application doesn't recognize a status code returned by the server, it can determine the general meaning of the response by using the first digit of the returned status code. Table 2-3 lists some of the most common response status codes.

Table 2-3. Common HTTP response status codes
Code	Meaning
200	OK—The request succeeded.
302	Moved Temporarily —The request resides temporarily at a different URI. If the new URI is a location, the `Location` header field in the response will give the new URL. This code typically is used when the client is being redirected.
400	Bad Request—The server couldn't understand the request due to malformed syntax.
401	Unauthorized—The request requires authentication and/or authorization.
403	Forbidden—The server understood the request but for some reason is refusing to fulfill it. The server may or may not reveal why it is refusing the request.
404	Not Found—The server has not found anything matching the request URI.
500	Internal Server Error—The server encountered an unexpected condition that prevented it from fulfilling the request.

The header fields in the response are similar in format to those found in the request message. They allow the server to pass to the client additional information that cannot be placed in the status line. These fields give information about the server and about further access to the URI contained within the request. After the last response header, which is followed by an empty line, the server can insert the response message body. In many cases, the response message body is HTML output. Figure 2-4 illustrates an example response to the following request:

GET /hello.html HTTP/1.0

Figure 2-4. An example HTTP response message

figs/jstr_0204.gif

2.2.3 HTTP Versus HTTPS

You've probably noticed that the request and response message text shown in the previous examples all have been standard readable text. This is fine when you don't need to protect the data; however, you would never want to send confidential data in the clear. When you need to ensure the integrity and privacy of information that is sent over a network, especially an open one like the Internet, one of the options is to use the HTTPS protocol, rather than standard HTTP.

HTTPS is normal HTTP wrapped by a Secure Sockets Layer (SSL). SSL is a communication system that ensures privacy when communicating with other SSL-enabled applications. It's really just a protocol that runs on top of the TCP/IP layer. It encrypts the data through the use of symmetric encryption and digital certificates. An SSL connection can be established between a client and server only when both systems are running in SSL mode and are able to authenticate each other.

The fact that SSL encrypts the transmitted data has no impact on the underlying request and response messages. The encryption and subsequent decryption on the other side occur after the message body is constructed and is decoupled from the HTTP portion of the message.