To better illustrate how the web server and servlet container work together to service clients, this section discusses the protocol for an HTTP request and response, from the time a client request is received until the server returns a response. Struts makes heavy use of the request and response objects, and a complete understanding of the round-trip process will help clarify some topics discussed later in the book.
|
HTTP is based on a request/response model, so there are two types of HTTP messages: the request and the response. The browser opens a connection to a server and makes a request. The server processes the client's request and returns a response. Figure 2-3 illustrates this process.
Both types of messages consist of a start line, zero or more header fields, and an empty line that indicates the end of the message headers. Both message types also may contain an optional message body.
The format and makeup of the request and response messages are very similar, but there are a few differences. We'll discuss each type of message separately.
The start line of an HTTP request is known as the request line. It's always the first line of the request message, and it contains three separate fields:
· An HTTP method
· A universal resource identifier (URI)
· An HTTP protocol version
Although there are several HTTP methods for retrieving data from a server, the two used most often are GET and POST . The GET method requests from the server the resource, indicated by the request URI. If the URI points to a data-producing resource such as a servlet, the data will be returned within the response message. Although the GET message can pass information in the query string, the POST method is used to explicitly pass to the server data that can be used for processing by the request URI.
The URI identifies the resource that should process the request. For the purposes of this discussion, it can be either an absolute or a relative path. A request with an invalid URI will return an error code (typically 404).
The HTTP request protocol version identifies to the server which version of the HTTP specification the request conforms to. The following example illustrates the request line for a sample GET request:
GET /index.html HTTP/1.0
You can execute this example by opening up a Telnet session to a server running a web server. You must specify the hostname and port number of the web server. For example:
telnet localhost 80
Then type the GET command. You will need to press Enter twice after issuing the command: once for the end of the request line and again to let the server know you are finished with the request. Assuming there's a file called index.html in the root directory, the HTML response will be returned. (Actually, you will always see a response—it just may not be the one that you expected.) We'll talk more about using Telnet to interact with a web server when we discuss redirects and forwards later in this chapter.
As mentioned earlier, the HTTP request may contain zero or more header fields. Request header fields allow the client to pass to the server additional information about the request and the client itself. The format of a header field, for both the request and the response, is the name of the header field, followed by a colon (:) and the value. If multiple values are specified for a single header field, they must be separated with commas. Table 2-1 lists some of the more commonly used request headers.
The message body for a request is used to carry to the server data associated with the request. The data included within the body is different from the values used by the header fields in terms of both format and content. The header fields can be thought of as metadata about the message body.
Once the server has received and processed the request, it must return an HTTP response message to the client. The response message consists of a status line and zero or more header fields, followed by an empty line. It also may have an optional message body.
The first line of the HTTP response message is known as the status line. It consists of the HTTP protocol version that the response conforms to, followed by a numeric status code and its textual explanation. Each field is separated from the next by a space. An example response status line is shown here:
HTTP/1.1 200 OK
The status code is a three-digit numeric value that corresponds to the result code of the server's attempt to satisfy the request. The status code is for programmatic applications, while the text that accompanies it is intended for human readers. The first digit of the status code defines the category of the result code. Table 2-2 lists the allowed first digits and the corresponding categories.
Quite a few status codes have been defined. They also are extensible, which allows applications to extend the behavior of the server. If a client application doesn't recognize a status code returned by the server, it can determine the general meaning of the response by using the first digit of the returned status code. Table 2-3 lists some of the most common response status codes.
The header fields in the response are similar in format to those found in the request message. They allow the server to pass to the client additional information that cannot be placed in the status line. These fields give information about the server and about further access to the URI contained within the request. After the last response header, which is followed by an empty line, the server can insert the response message body. In many cases, the response message body is HTML output. Figure 2-4 illustrates an example response to the following request:
GET /hello.html HTTP/1.0
You've probably noticed that the request and response message text shown in the previous examples all have been standard readable text. This is fine when you don't need to protect the data; however, you would never want to send confidential data in the clear. When you need to ensure the integrity and privacy of information that is sent over a network, especially an open one like the Internet, one of the options is to use the HTTPS protocol, rather than standard HTTP.
HTTPS is normal HTTP wrapped by a Secure Sockets Layer (SSL). SSL is a communication system that ensures privacy when communicating with other SSL-enabled applications. It's really just a protocol that runs on top of the TCP/IP layer. It encrypts the data through the use of symmetric encryption and digital certificates. An SSL connection can be established between a client and server only when both systems are running in SSL mode and are able to authenticate each other.
The fact that SSL encrypts the transmitted data has no impact on the underlying request and response messages. The encryption and subsequent decryption on the other side occur after the message body is constructed and is decoupled from the HTTP portion of the message.