Chapter 14

Chapter 14. Servlets and Web Applications

Now we're going to take a leap from the client side to the server side to learn how to write Java applications for web servers. The Java Servlet API is a framework for writing servlets, application components for web services, just as applets are application components for a web browser. The Servlet API provides a simple yet powerful architecture for web-based applications. The Servlet API lives in the javax.servlet package, a standard Java API extension, so technically it isn't part of the core Java APIs. In this book, we haven't talked about many standard extension packages, but this one is particularly important. This chapter covers the Java Servlet API 2.3.

Most web servers support the Servlet API either directly or indirectly through add-on modules. Servers that support the full set of Java Enterprise APIs (including servlets, JSPs, and Enterprise JavaBeans) are called application servers . JBoss is a free, open source Java application server available from http://www.jboss.org, and BEA's WebLogic is a popular commercial application server. Components that handle just the servlets are more precisely called servlet containers or servlet runners.

We try to avoid talking about details of particular servlet environments, but we will use the Apache Project's Tomcat server for the examples in this book. Tomcat is a popular, free servlet engine that can be used by itself or in conjunction with popular web servers. It is easy to configure and is a pure Java application, so you can use it on any platform that has a Java VM. You can download it from http://jakarta.apache.org/tomcat/. Tomcat has been adopted by Sun as part of the J2EE reference implementation, so it always has an up-to-date implementation of the specifications available in both source and binary form. The Servlet APIs and Java documentation can be downloaded directly from http://java.sun.com/products/servlet/. You might consider taking a look at the Java servlet specification white paper, also available at that location. It is unusually readable for a reference document.

14.1 Servlets: Powerful Tools

Many different ways of writing server-side software for web applications have evolved over the years. Early on, the standard was CGI, usually in combination with a scripting language such as Perl. Various web servers also offered native-language APIs for pluggable software modules. Java, however—and in particular the Java Servlet API—is rapidly becoming the most popular architecture for building web-based applications. Java servlet containers (engines) are available for virtually every web server.

So, why has Java become so popular on the server side? Servlets let you write web applications in Java and derive all the benefits of Java and the virtual machine environment (along with the same limitations, of course). Java is generally faster than scripting languages, especially in a server-side environment where long-running applications can be highly optimized by the virtual machine. Servlets have an additional speed advantage over traditional CGI programs, because servlets execute in a multithreaded way within one instance of a virtual machine. Older CGI applications required the server to start a separate process, pipe data to it, and receive the response as a stream. The unique runtime safety of Java also beats most native APIs in a production web-server environment, where it would be very bad to have an errant transaction bring down the server.

So, speed and safety are factors, but perhaps the most important reason for using Java is that it makes writing large and complex applications much more manageable. Java servlets may not be as easy to write as scripts, but they are easier to update with new features, and servlets are far better at scaling for complex, high-volume applications. From servlet code, you can access all the standard Java APIs within the virtual machine while your servlets are handling requests. This means that your Java servlet code can work well in a multitiered architecture, accessing "live" database connections with JDBC or communicating with other network services that have already been established. This kind of behavior has been hacked into CGI environments, but for Java, it is both robust and natural.

Before we move on, we should also mention servlets' relationships to two other technologies: Java Server Pages (JSPs) and XML/XSL. JSPs are another way to write server-side applications. They consist primarily of HTML content with Java-like syntax embedded within the documents. JSPs are compiled dynamically by the web server into Java servlets and can work with Java APIs directly and indirectly to generate dynamic content for the pages. XML is a powerful set of standards for working with structured information in text form. The Extensible Stylesheet Language (XSL) is a language for transforming XML documents into other kinds of documents, including HTML. The combination of servlets that can generate XML content and XSL stylesheets that can transform content for presentation is a very exciting direction, covered in detail in Chapter 23.

14.2 Web Applications

So far we've used the term " web application" generically. Now we are going to have to be more precise with that term. In the context of the Java Servlet API, a web application is a collection of servlets, supporting Java classes, and content such as HTML or JSP pages and images. For deployment (installation into a web server), a web application is bundled into a Web Application Resources (WAR) file. We'll discuss WAR files in detail later, but suffice it to say that they are essentially JAR archives containing the application files along with some deployment information. The important thing is that the standardization of WAR files means not only that the Java code is portable, but also that the process of deploying all the application's parts is standardized.

At the heart of the WAR archive is the web.xml file. This file describes which servlets and JSPs are to be run, their names and URL paths, their initialization parameters and a host of other information, including security and authentication requirements.

Web applications, or WebApps, also have a very well-defined runtime environment. Each WebApp has its own "root" path on the web server, meaning that all the URLs addressing its servlets and files start with a common unique prefix (e.g., www.oreilly.com/someapplication/). The WebApp's servlets are also isolated from those of other web applications. WebApps cannot directly access each other's files (although they may be allowed to do so through the web server, of course). Each WebApp also has its own servlet context. We'll discuss the servlet context in more detail, but in brief, it is a common area for servlets to share information and get resources from the environment. The high degree of isolation between web applications is intended to support the dynamic deployment and updating of applications required by modern business systems.

14.3 The Servlet Life Cycle

Let's jump ahead now to the Servlet API itself so that we can get started building servlets right away. We'll fill in the gaps later when we discuss various parts of the APIs and WAR file structure in more detail. The Servlet API is very simple, almost exactly paralleling the Applet API. There are three life-cycle methods—init(), service(), and destroy()—along with some methods for getting configuration parameters and servlet resources. Before a servlet is used the first time, it's initialized by the server through its init() method. Thereafter the servlet spends its time handling service() requests and doing its job until (presumably) the server is shut down, and the servlet's destroy() method is called, giving it an opportunity to clean up.

Generally only one instance of each deployed servlet class is instantiated per server. To be more precise, it is one instance per entry in the web.xml file, but we'll talk more about servlet deployment later. And there is an exception to that rule when using the special SingleThreadModel, described below.

The service() method of a servlet accepts two parameters: a servlet "request" object and a servlet "response" object. These provide tools for reading the client request and generating output; we'll talk about them in detail in the examples.

By default, servlets are expected to handle multithreaded requests; that is, the servlet's service methods may be invoked by many threads at the same time. This means that you cannot store client-related data in instance variables of your servlet object. (Of course, you can store general data related to the servlet's operation, as long as it does not change on a per-request basis.) Per-client state information can be stored in a client session object (such as a cookie), which persists across client requests. We'll talk about that later as well.

If for some reason you have developed a servlet that cannot support multithreaded access, you can indicate this to the servlet container by implementing the flag interface SingleThreadModel. This interface has no methods, serving only to indicate that the servlet should be invoked in a single-threaded manner. When implementing the SingleThreadModel, the container may create more than one instance of your servlet per VM in order to pool requests.

14.4 Web Servlets

There are actually two packages of interest in the Servlet API. The first is the javax.servlet package, which contains the most general Servlet APIs. The second important package is javax.servlet.http, which contains APIs specific to servlets that handle HTTP requests for web servers. In the rest of this section, we are going to discuss servlets pretty much as if all servlets were HTTP-related. You can write servlets for other protocols, but that's not what we're currently interested in.

The primary tool provided by the javax.servlet.http package is the HttpServlet base class. This is an abstract servlet that provides some basic implementation related to handling an HTTP request. In particular, it overrides the generic servlet service() request and breaks it out into several HTTP-related methods, including doGet(), doPost(), doPut(), and doDelete(). The default service() method examines the request to determine what kind it is and dispatches it to one of these methods, so you can override one or more of them to implement the specific web server behavior you need.

doGet() and doPost() correspond to the standard HTTP GET and POST operations. GET is the standard request for retrieving a file or document at a specified URL. POST is the method by which a client sends an arbitrary amount of data to the server. HTML forms are the most common use for POST.

To round these out, HttpServlet provides the doPut() and doDelete() methods. These methods correspond to a poorly supported part of the HTTP protocol, meant to provide a way to upload and remove files. doPut() is supposed to be like POST but with different semantics; doDelete() would be its opposite. These aren't widely used.

HttpServlet also implements three other HTTP-related methods for you: doHead(), doTrace(), and doOptions(). You don't normally need to override these methods. doHead() implements the HTTP HEAD request, which asks for the headers of a GET request without the body. HttpServlet implements this by default by performing the GET method and then sending only the headers. You may wish to override doHead() with a more efficient implementation if you can provide one as an optimization. doTrace() and doOptions() implement other features of HTTP that allow for debugging and simple client/server capabilities negotiation. You generally shouldn't need to override these.

Along with HttpServlet, javax.servlet.http also includes subclasses of the ServletRequest and ServletResponse objects, HttpServletRequest and HttpServletResponse . These subclasses provide, respectively, the input and output streams needed to read and write client data. They also provide the APIs for getting or setting HTTP header information and, as we'll see, client session information. Rather than document these dryly, we'll show them in the context of some examples. As usual, we'll start with the simplest possible example.

14.5 The HelloClient Servlet

Here's our servlet version of "Hello World"— HelloClient:

//file: HelloClient.java
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
  
public class HelloClient extends HttpServlet { 
  
    public void doGet(HttpServletRequest request,
                      HttpServletResponse response) 
        throws ServletException, IOException {
  
        // must come first
        response.setContentType("text/html");
        PrintWriter out = response.getWriter( );
  
        out.println( 
            "<html><head><title>Hello Client</title></head><body>"
            + "<h1> Hello Client </h1>"
            + "</body></html>" );
        out.close( );
    }
}

If you want to try out this servlet right away, skip ahead to the sections Section 14.11 and Section 14.11.3, where we walk through the process of running this servlet. It's simply a matter of packaging up the servlet class file along with a simple web.xml file that describes it and placing it on your server. But for now we're going to discuss just the servlet example code itself.

Let's have a look at the example. HelloClient extends the base HttpServlet class and overrides the doGet() method to handle simple requests. In this case, we want to respond to any GET request by sending back a one-line HTML document that says "Hello Client." First we tell the container what kind of response we are going to generate, using the setContentType() method of the HttpServletResponse object. Then we get the output stream using the getWriter() method and print the message to it. Finally, we close the stream to indicate we're done generating output. (It shouldn't strictly be necessary to close the output stream, but we show it for completeness.)

14.5.1 Servlet Exceptions

The doGet() method of our example servlet declares that it can throw a ServletException. All of the service methods of the Servlet API may throw a ServletException to indicate that a request has failed. A ServletException can be constructed with a string message and an optional Throwable parameter that can carry any corresponding exception representing the root cause of the problem:

throw new ServletException("utter failure", someException );

By default, the web server determines exactly what is shown to the user when a ServletException is thrown, but often the exception and its stack trace are displayed. Through the web.xml file, you can designate custom error pages; see Section 14.13 later in this chapter for details.

Alternatively, a servlet may throw an UnavailableException, a subclass of ServletException, to indicate that it cannot handle requests. This exception can be constructed to indicate that the condition is permanent or that it should last for a specified period of seconds.

14.5.2 Content Type

Before fetching the output stream and writing to it, we must specify the kind of output we are sending by calling the response parameter's setContentType() method. In this case, we set the content type to text/html, which is the proper MIME type for an HTML document. In general, though, it's possible for a servlet to generate any kind of data, including sound, video, or some other kind of text. If we were writing a generic FileServlet to serve files like a regular web server, we might inspect the filename extension and determine the MIME type from that or from direct inspection of the data. For writing binary data, you can use the getOutputStream() method to get an OutputStream as opposed to a Writer.

The content type is used in the Content-Type: header of the server's HTTP response, which tells the client what to expect even before it starts reading the result. This allows your web browser to prompt you with the "Save File" dialog when you click on a ZIP archive or executable program. When the content-type string is used in its full form to specify the character encoding (for example, text/html; charset=ISO-8859-1), the information is also used by the servlet engine to set the character encoding of the PrintWriter output stream. As a result, you should always call the setContentType() method before fetching the writer with the getWriter() method.

14.6 The Servlet Response

In addition to providing the output stream for writing content to the client, the HttpServletResponse object provides methods for controlling other aspects of the HTTP response, including headers, error result codes, redirects, and servlet container buffering.

HTTP headers are metadata name/value pairs sent with the response. You can add headers (standard or custom) to the response with the setHeader() and addHeader() methods (headers may have multiple values). There are also convenience methods for setting headers with integer and date values:

response.setIntHeader("MagicNumber", 42);
response.setDateHeader("CurrentTime", System.currentTimeMillis(  ) );

When you write data to the client, the servlet container automatically sets the HTTP response code to a value of 200, which means OK. Using the sendError() method, you can generate other HTTP response codes. HttpServletResponse contains predefined constants for all of the standard codes. Here are a few common ones:

HttpServletResponse.SC_OK
HttpServletResponse.SC_BAD_REQUEST
HttpServletResponse.SC_FORBIDDEN
HttpServletResponse.SC_NOT_FOUND
HttpServletResponse.SC_INTERNAL_SERVER_ERROR
HttpServletResponse.SC_NOT_IMPLEMENTED
HttpServletResponse.SC_SERVICE_UNAVAILABLE

When you generate an error with sendError(), the response is over, and you can't write any content to the client. You can specify a short error message, however, which may be shown to the client. (See Section 14.15.1, for an example.)

An HTTP redirect is a special kind of response that tells the client web browser to go to a different URL. Normally this happens quickly and without any interaction from the user. You can send a redirect with the sendRedirect() method:

response.sendRedirect(http://www.oreilly.com/);

We should say a few words about buffering. Most responses are buffered internally by the servlet container until the servlet service method has exited. This allows the container to set the HTTP content-length header automatically, telling the client how much data to expect. You can control the size of this buffer with the setBufferSize() method, specifying a size in bytes. You can even clear it and start over if no data has been written to the client. To clear the buffer, use isCommitted() to test whether any data has been set, then use resetBuffer() to dump the data if none has been sent. If you are sending a lot of data, you may wish to set the content length explicitly with the setContentLength() method.

14.7 Servlet Parameters

Our first example shows how to accept a basic request. A more sophisticated servlet might do arbitrary processing or handle database queries, for example. Of course, to do anything really useful we'll need to get some information from the user. Fortunately, the servlet engine handles this for us, interpreting both GET- and POST-encoded form data from the client and providing it to us through the simple getParameter() method of the servlet request.

14.7.1 GET, POST, and the "Extra Path"

There are essentially two ways to pass information from your web browser to a servlet or CGI program. The most general is to "post" it, which means that your client encodes the information and sends it as a stream to the program, which decodes it. Posting can be used to upload large amounts of form data or other data, including files. The other way to pass information is to somehow encode the information in the URL of your client's request. The primary way to do this is to use GET-style encoding of parameters in the URL string. In this case, the web browser encodes the parameters and appends them to the end of the URL string. The server decodes them and passes them to the application.

As we described in Chapter 13, GET-style encoding takes the parameters and appends them to the URL in a name/value fashion, with the first parameter preceded by a question mark (?) and the rest separated by ampersands (&). The entire string is expected to be URL-encoded: any special characters (such as spaces, ?, and & in the string) are specially encoded.

A less sophisticated form of encoding data in the URL is called extra path. This simply means that when the server has located your servlet or CGI program as the target of a URL, it takes any remaining path components of the URL string and simply hands it over as an extra part of the URL. For example, consider these URLs:

http://www.myserver.example/servlets/MyServlet
http://www.myserver.example/servlets/MyServlet/foo/bar

Suppose the server maps the first URL to the servlet called MyServlet. When subsequently given the second URL, the server still invokes MyServlet, but considers /foo/bar to be an "extra path" that can be retrieved through the servlet request getExtraPath() method.

Both GET and POST encoding can be used with HTML forms on the client by specifying get or post in the action attribute of the form tag. The browser handles the encoding; on the server side, the servlet engine handles the decoding.

The content type used by a client to post form data to a servlet is the same as that for any CGI: "application/x-www-form-urlencoded." The Servlet API automatically parses this kind of data and makes it available through the getParameter() method. However, if you do not use the getParameter() method, the data remains available in the input stream and can be read by the servlet directly.

14.7.2 GET or POST: Which One to Use?

To users, the primary difference between GET and POST is that they can see the GET information in the encoded URL shown in their web browser. This can be useful because the user can cut and paste that URL (the result of a search, for example) and mail it to a friend or bookmark it for future reference. POST information is not visible to the user and ceases to exist after it's sent to the server. This behavior goes along with the protocol's perspective that GET and POST are intended to have different semantics. By definition, the result of a GET operation is not supposed to have any side effects. That is, it's not supposed to cause the server to perform any subsequent operations (such as making an e-commerce purchase). In theory, that's the job of POST. That's why your web browser warns you about reposting form data again if you hit reload on a page that was the result of a form posting.

The extra path method is not useful for form data but would be useful for a servlet that retrieves files or handles a range of URLs in a human-readable way not driven by forms.

14.8 The ShowParameters Servlet

Our first example didn't do anything interesting. This example prints the values of any parameters that were received. We'll start by handling GET requests and then make some trivial modifications to handle POST as well. Here's the code:

//file: ShowParameters.java
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import java.util.Enumeration;
  
public class ShowParameters extends HttpServlet { 
  
    public void doGet(HttpServletRequest request,
                      HttpServletResponse response) 
      throws ServletException, IOException {
        showRequestParameters( request, response );
    }
  
    void showRequestParameters(HttpServletRequest request,
                               HttpServletResponse response)
      throws IOException {
        response.setContentType("text/html");
        PrintWriter out = response.getWriter( );
  
        out.println(
          "<html><head><title>Show Parameters</title></head><body>"
          + "<h1>Parameters</h1><ul>");
  
        for ( Enumeration e=request.getParameterNames( );
              e.hasMoreElements( ); ) {
            String name = (String)e.nextElement( );
            String value = request.getParameter( name );
            if (! value.equals("") )
                out.println("<li>"+ name +" = "+ value );
        }
  
        out.close( );
    }
}

There's not much new here. As in the first example, we override the doGet() method. Here, we delegate the request to a helper method that we've created, called showRequestParameters(). This method just enumerates the parameters using the request object's getParameterNames() method and prints the names and values. (To make it pretty, we listed them in an HTML list by prefixing each with an <li> tag.)

As it stands, our servlet would respond to any URL that contains a GET request. Let's round it out by adding our own form to the output and also accommodating POST method requests. To accept posts, we override the doPost() method. The implementation of doPost() could simply call our showRequestParameters() method, but we can make it simpler still. The API lets us treat GET and POST requests interchangeably because the servlet engine handles the decoding of request parameters. So we simply delegate the doPost() operation to doGet().

Add the following method to the example:

public void doPost( HttpServletRequest request,
                    HttpServletResponse response) 
  throws ServletException, IOException {
    doGet( request, response );
}

Now let's add an HTML form to the output. The form lets the user fill in some parameters and submit them to the servlet. Add this line to the showRequestParameters() method before the call to out.close():

out.println(
  "</ul><p><form method=\"POST\" action=\"" 
  + request.getRequestURI( ) + "\">"
  + "Field 1 <input name=\"Field 1\" size=20><br>"
  + "Field 2 <input name=\"Field 2\" size=20><br>"
  + "<br><input type=\"submit\" value=\"Submit\"></form>"
);

The form's action attribute is the URL of our servlet so that the servlet will get the data. We use the getRequestURI() method to ask for the location of our servlet. For the method attribute, we've specified a POST operation, but you can try changing the operation to GET to see both styles.

So far, we haven't done anything that you couldn't do easily with your average CGI script. In the following section, we'll show something more interesting: how to manage a user session. But before we go on, we should mention a useful standard servlet that is a kin of our example above, SnoopServlet.

14.8.1 SnoopServlet

Most servlet containers come with some useful servlets that serve as examples or debugging aids. One of the most basic tools you have for debugging servlets is the "SnoopServlet." We place that name in quotes because you will find many different implementations of this with various names. But the original SnoopServlet came with the Java servlet development kit and is currently supplied with the Tomcat server distribution. This very simple debugging servlet displays everything about its environment, including all of its request parameters, just as our ShowParameters example did. There is a lot of useful information there. In the default Tomcat 4.0 distribution, you can access this servlet at http://myserver:8080/examples/snoop.

14.9 User Session Management

One of the nicest features of the Servlet API is its simple mechanism for managing a user session. By a session, we mean that the servlet can maintain information over multiple pages and through multiple transactions as navigated by the user; this is also called maintaining state. Providing continuity through a series of web pages is important in many kinds of applications, such as handling a login process or tracking purchases in a shopping cart. In a sense, session data takes the place of instance data in your servlet object. It lets you store data between invocations of your service methods.

Session tracking is supported by the servlet engine; you don't have to worry about the details of how it's accomplished. It's done in one of two ways: using client-side cookies or URL rewriting. Client-side cookies are a standard HTTP mechanism for getting the client web browser to cooperate in storing state information for you. A cookie is basically just a name/value attribute that is issued by the server, stored on the client, and returned by the client whenever it is accessing a certain group of URLs on a specified server. Cookies can track a single session or multiple user visits.

URL rewriting appends session-tracking information to the URL, using GET-style encoding or extra path information. The term "rewriting" applies because the server rewrites the URL before it is seen by the client and absorbs the extra information before it is passed back to the servlet. In order to support URL rewriting, a servlet must take the extra step to encode any URLs it generates in content (e.g., HTML links that may return to the page) using a special method of the HttpServletResponse object. We'll describe this later.

To the servlet programmer, state information is made available through an HttpSession object, which acts like a hashtable for storing whatever objects you would like to carry through the session. The objects stay on the server side; a special identifier is sent to the client through a cookie or URL rewriting. On the way back, the identifier is mapped to a session, and the session is associated with the servlet again.

14.9.1 The ShowSession Servlet

Here's a simple servlet that shows how to store some string information to track a session:

//file: ShowSession.java
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import java.util.Enumeration;
  
public class ShowSession extends HttpServlet {
  
    public void doPost( 
        HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    { 
        doGet( request, response );
    }
  
    public void doGet(
        HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException 
    {
        HttpSession session = request.getSession(  );
        boolean clear = request.getParameter("clear") != null;
        if ( clear )
            session.invalidate(  );
        else {
            String name = request.getParameter("Name");
            String value = request.getParameter("Value");
            if ( name != null && value != null )
                session.setAttribute( name, value );
        }
  
        response.setContentType("text/html");
        PrintWriter out = response.getWriter(  );
        out.println(
          "<html><head><title>Show Session</title></head><body>");
  
        if ( clear )
            out.println("<h1>Session Cleared:</h1>");
        else {
            out.println("<h1>In this session:</h1><ul>");
            Enumeration names = session.getAttributeNames(  );
            while ( names.hasMoreElements(  ) ) {
                String name = (String)names.nextElement(  );
                out.println( "<li>"+name+" = " +session.getAttribute( name ) );
            }
        }
  
        out.println(
          "</ul><p><hr><h1>Add String</h1>"
          + "<form method=\"POST\" action=\""
          + request.getRequestURI(  ) +"\">"
          + "Name: <input name=\"Name\" size=20><br>"
          + "Value: <input name=\"Value\" size=20><br>"
          + "<br><input type=\"submit\" value=\"Submit\">"
          + "<input type=\"submit\" name=\"clear\" value=\"Clear\"></form>"
        );
    }
}

When you invoke the servlet, you are presented with a form that prompts you to enter a name and a value. The value string is stored in a session object under the name provided. Each time the servlet is called, it outputs the list of all data items associated with the session. You will see the session grow as each item is added (in this case, until you restart your web browser or the server).

The basic mechanics are much like our ShowParameters servlet. Our doGet() method generates the form, which refers back to our servlet via a POST method. We override doPost() to delegate back to our doGet() method, allowing it to handle everything. Once in doGet(), we attempt to fetch the user session object from the request parameter using getSession(). The HttpSession object supplied by the request functions like a hashtable. There is a setAttribute() method, which takes a string name and an Object argument, and a corresponding getAttribute() method. In our example, we use the getAttributeNames() method to enumerate the values currently stored in the session and to print them.

By default, getSession() creates a session if one does not exist. If you want to test for a session or explicitly control when one is created, you can call the overloaded version getSession(false), which does not automatically create a new session and returns null if there is no session. To clear a session immediately, we can use the invalidate() method. After calling invalidate() on a session, we are not allowed to access it again, so we set a flag in our example and show the "Session Cleared" message. Sessions may also become invalid on their own by timing out. You can control session timeout in the application server or through the web.xml file (via the "session-timeout" value of the "session config" section). User sessions are private to each web application and are not shared across applications.

We mentioned earlier that an extra step is required to support URL rewriting for web browsers that don't support cookies. To do this, we must make sure that any URLs we generate in content are first passed through the HttpServletResponse encodeURL() method. This method takes a string URL and returns a modified string only if URL rewriting is necessary. Normally, when cookies are available, it returns the same string. In our previous example, we should have encoded the server form URL retrieved from getRequestURI() before passing it to the client.

14.9.2 The ShoppingCart Servlet

Now we build on the previous example to make a servlet that could be used as part of an online store. ShoppingCart lets users choose items and add them to their basket until checkout time:

//file: ShoppingCart.java
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import java.util.Enumeration;
  
public class ShoppingCart extends HttpServlet {
    String [] items = new String [] {
        "Chocolate Covered Crickets", "Raspberry Roaches",
        "Buttery Butterflies", "Chicken Flavored Chicklets(tm)" };
  
    public void doPost( 
        HttpServletRequest request, HttpServletResponse response) 
        throws IOException, ServletException 
    {
        doGet( request, response );
    }
  
    public void doGet( 
        HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException 
    {
        response.setContentType("text/html");
        PrintWriter out = response.getWriter(  );
  
        // get or create the session information
        HttpSession session = request.getSession(  );
        int [] purchases = (int [])session.getAttribute("purchases");
        if ( purchases == null ) {
            purchases = new int [ items.length ];
            session.setAttribute( "purchases", purchases );
        }
  
        out.println( "<html><head><title>Shopping Cart</title>"
                     + "</title></head><body><p>" );
  
        if ( request.getParameter("checkout") != null )
            out.println("<h1>Thanks for ordering!</h1>");
        else  {
            if ( request.getParameter("add") != null ) {
                addPurchases( request, purchases );
                out.println(
                    "<h1>Purchase added.  Please continue</h1>");
            } else {
                if ( request.getParameter("clear") != null )
                    for (int i=0; i<purchases.length; i++)
                         purchases[i] = 0;
                out.println("<h1>Please Select Your Items!</h1>");
            }
            doForm( out, request.getRequestURI(  ) );
        }
        showPurchases( out, purchases );
        out.close(  );
    }
  
    void addPurchases( HttpServletRequest request, int [] purchases ) {
        for (int i=0; i<items.length; i++) {
            String added = request.getParameter( items[i] );
            if ( added !=null && !added.equals("") )
                purchases[i] += Integer.parseInt( added );
        }
    }
  
    void doForm( PrintWriter out, String requestURI ) {
        out.println( "<form method=POST action="+ requestURI +">" );
  
        for(int i=0; i< items.length; i++)
            out.println( "Quantity <input name=\"" + items[i]
              + "\" value=0 size=3> of: " + items[i] + "<br>");
        out.println(
          "<p><input type=submit name=add value=\"Add To Cart\">"
          + "<input type=submit name=checkout value=\"Check Out\">"
          + "<input type=submit name=clear value=\"Clear Cart\">"
          + "</form>" );
    }
  
    void showPurchases( PrintWriter out, int [] purchases )
        throws IOException {
  
        out.println("<hr><h2>Your Shopping Basket</h2>");
        for (int i=0; i<items.length; i++)
            if ( purchases[i] != 0 )
                out.println( purchases[i] +"  "+ items[i] +"<br>" );
    }
}

ShoppingCart has some instance data: a String array that holds a list of products. We're making the assumption that the product selection is the same for all customers. If it's not, we'd have to generate the product list on the fly or put it in the session for the user.

We see the same basic pattern as in our previous servlets, with doPost() delegating to doGet(), and doGet() generating the body of the output and a form for gathering new data. Here we've broken down the work using a few helper methods: doForm(), addPurchases(), and showPurchases(). Our shopping cart form has three submit buttons: one for adding items to the cart, one for checkout, and one for clearing the cart. In each case, we display the contents of the cart. Depending on the button pressed, we add new purchases, clear the list, or simply show the results as a checkout window.

The form is generated by our doForm() method, using the list of items for sale. As in the other examples, we supply our servlet's address as the target of the form. Next, we have placed an integer array called purchases into the user session. Each element in purchases holds a count of the number of each item the user wants to buy. We create the array after retrieving the session simply by asking the session for it. If this is a new session, and the array hasn't been created, getValue() gives us a null array to populate. Since we generate the form using the names from the items array, it's easy for addPurchases() to check for each name using getParameter() and increment the purchases array for the number of items requested. We also test for the value being equal to the empty string, because some broken web browsers send empty strings for unused field values. Finally, showPurchases() simply loops over the purchases array and prints the name and quantity for each item that the user has purchased.

14.9.3 Cookies

In our previous examples, a session lived only until you shut down your web browser or the server. You can do more long-term user tracking or identification by managing cookies explicitly. You can send a cookie to the client by creating a javax.servlet.http.Cookie object and adding it to the servlet response using the addCookie() method. Later you can retrieve the cookie information from the servlet request and use it to look up persistent information in a database. The following servlet sends a "Learning Java" cookie to your web browser and displays it when you return to the page:

//file: CookieCutter.java
import java.io.*;
import java.text.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;
  
public class CookieCutter extends HttpServlet {
  
    public void doGet(HttpServletRequest request,
                      HttpServletResponse response)
      throws IOException, ServletException {
        response.setContentType("text/html");
        PrintWriter out = response.getWriter( );
  
        if ( request.getParameter("setcookie") != null ) {
            Cookie cookie = new Cookie("Learningjava", "Cookies!");
            cookie.setMaxAge(3600);
            response.addCookie(cookie);
            out.println("<html><body><h1>Cookie Set...</h1>");
        } else {
            out.println("<html><body>");
            Cookie[] cookies = request.getCookies( );
            if ( cookies.length == 0 )
                out.println("<h1>No cookies found...</h1>");
            else
                for (int i = 0; i < cookies.length; i++)
                    out.print("<h1>Name: "+ cookies[i].getName( )
                              + "<br>"
                              + "Value: " + cookies[i].getValue( )
                              + "</h1>" );
            out.println("<p><a href=\""+ request.getRequestURI( )
              +"?setcookie=true\">"
              +"Reset the Learning Java cookie.</a>");
        }
        out.println("</body></html>");
        out.close( );
    }
}

This example simply enumerates the cookies supplied by the request object using the getCookies() method and prints their names and values. We provide a GET-style link that points back to our servlet with a parameter setcookie, indicating that we should set the cookie. In that case, we create a Cookie object using the specified name and value and add it to the response with the addCookie() method. We set the maximum age of the cookie to 3600 seconds, so it remains in the browser for an hour before being discarded (we'll talk about tracking a cookie across multiple sessions later). You can specify an arbitrary time period here or a negative time period to indicate that the cookie should not be stored persistently on the client. Indicating a negative time period is a good way to erase an existing cookie of the same name.

Two other Cookie methods are of interest: setDomain() and setPath(). These methods allow you to specify the domain name and path component that limits the servers to which the client will send the cookie. If you're writing some kind of purchase applet for L.L. Bean, you don't want clients sending your cookies over to Eddie Bauer. In practice, however, this cannot happen. The default domain is the domain of the server sending the cookie. (You may not be able to specify other domains for security reasons.) The path parameter defaults to the base URL of the servlet, but you can specify a wider (or narrower) range of URLs on the server by setting this parameter manually.

14.10 The ServletContext API

Web applications have access to the server environment through the ServletContext API. A reference to the ServletContext can be obtained from the HttpServlet getServletContext() method.

ServetContext context = getServletContext( );

Each WebApp has its own ServletContext. The context provides a shared space in which a WebApp's servlets may rendezvous and post objects. Objects may be placed into the context with the setAttribute() method and retrieved by name with the getAttribute() method.

context.setAttribute("myapp.statistics", myObject);
Object stats = context.getAttribute("myapp.statistics");

Attribute names beginning with "java." and "javax." are reserved for use by Java. Use the standard package-naming conventions for your attributes in order to avoid conflicts. One standard attribute that can be accessed through the servlet context is a reference to a private working directory java.io.File object. This temp directory is guaranteed unique to the WebApp. No guarantees are made about it being cleared upon exit, however, so you should use the temporary file API to create files here (unless you wish to try to keep them beyond the server exit). For example:

File tmpDir = (File)context.getAttribute("javax.servlet.context.tempdir");
File tmpFile = File.createTempFile( "appprefix", "appsuffix", tmpDir );

The servlet context also provides direct access to the WebApp's files from its root directory. The getResource() method is similar to the Class getResource() method (see Chapter 11). It takes a pathname and returns a special local URL for accessing that resource. In this case, it takes a path rooted in the servlet base directory (WAR file). The servlet may obtain references to files, including those in the WEB-INF directory, using this method. For example, a servlet may fetch an input stream for its own web.xml file:

InputStream in = context.getResourceAsStream("/WEB-INF/web.xml");

It could also use a URL reference to get one of its images:

URL bunnyURL = context.getResource("/images/happybunny.gif");

The method getResourcePaths() may be used to fetch a directory-style listing of all the resource files available matching a specified path. The return value is a java.util.Set collection of strings naming the resources available under the specified path. For example, the path / lists all files in the WAR file; the path /WEB-INF/ lists at least the web.xml file and classes directory.

The ServletContext is also a factory for RequestDispatcher objects.

14.11 WAR Files and Deployment

As we described in the introduction to this chapter, a WAR file is an archive that contains all the parts of a web application: Java class files for servlets, JSPs, HTML pages, images, and other resources. The WAR file is simply a JAR file with specified directories for the Java code and one very important file: the web.xml file, which tells the application server what to run and how to run it. WAR files always have the extension .war, but they can be created and read with the standard jar tool.

The contents of a typical WAR file might look like this, as revealed by the jar tool:

$ jar tvf shoppingcart.war
  
    index.html
    purchase.html
    receipt.html
    images/happybunny.gif
    WEB-INF/web.xml
    WEB-INF/classes/com/mycompany/PurchaseServlet.class
    WEB-INF/classes/com/mycompany/ReturnServlet.class
    WEB-INF/lib/thirdparty.jar

When deployed, the name of the WAR file becomes, by default, the root path of the web application, in this case shoppingcart. Thus the base URL for this WebApp, if deployed on www.oreilly.com, is http://www.oreilly.com/shoppingcart/, and all references to its documents, images, and servlets start with that path. The top level of the WAR file becomes the document root (base directory) for serving files. Our index.html file appears at the base URL we just mentioned, and our happybunny.gif image is referenced as http://www.oreilly.com/shoppingcart/images/happybunny.gif.

The WEB-INF directory (all caps, hyphenated) is a special directory that contains all deployment information and application code. This directory is protected by the web server, and its contents are not visible, even if you add WEB-INF to the base URL. Your application classes can load additional files from this area directly using getResource(), however, so it is a safe place to store application resources. The WEB-INF directory contains the all-important web.xml file, which we'll talk about more in a moment.

The WEB-INF/classes and WEB-INF/lib directories contain Java class files and JAR libraries, respectively. The WEB-INF/classes directory is automatically added to the classpath of the web application, so any class files placed here (using the normal Java package conventions) are available to the application. After that, any JAR files located in WEB-INF/lib are appended to the WebApp's classpath (the order in which they are appended is, unfortunately, not specified). You can place your classes in either location. During development, it is often easier to work with the "loose" classes directory and use the lib directory for supporting classes and third-party tools. Usually it's also possible to install classes and JAR files in the main system classpath of the servlet container to make them available to all WebApps running on that server. The procedure for doing this, however, is not standard and any classes that are deployed in this way cannot be automatically reloaded if changed—a feature of WAR files that we'll discuss later.

14.11.1 The web.xml File

The web.xml file is an XML file that lists the servlets to be run, the relative names (URL paths) under which to run them, their initialization parameters, and their deployment details, including security and authorization. We will assume that you have at least a passing familiarity with XML or that you can simply imitate the examples in a cut-and-paste fashion. (For details about working with Java and XML, see Chapter 23.) Let's start with a simple web.xml file for our HelloClient servlet example. It looks like this:

<web-app>
    <servlet>
        <servlet-name>helloclient1</servlet-name>
        <servlet-class>HelloClient</servlet-class>
    </servlet>
    <servlet-mapping>
        <servlet-name>helloclient1</servlet-name>
        <url-pattern>/hello</url-pattern>
    </servlet-mapping>
</web-app>

The top-level element of the document is called <web-app>. Many types of entries may appear inside the <web-app>, but the most basic are <servlet> declarations and <servlet-mapping> deployment mappings. The <servlet> declaration tag is used to declare an instance of a servlet and, optionally, to give it initialization and other parameters. One instance of the servlet class is instantiated for each <servlet> tag appearing in the web.xml file.

At minimum, the <servlet> declaration requires two pieces of information: a <servlet-name>, which is used as a handle to reference the servlet elsewhere in the web.xml file, and the <servlet-class> tag, which specifies the Java class name of the servlet. Here, we named the servlet helloclient1. We named it like this to emphasize that we could declare other instances of the same servlet if we wanted to, possibly giving them different initialization parameters, etc. The class name for our servlet is of course HelloClient. In a real application, the servlet class would likely have a full package name such as com.oreilly.servlets.HelloClient.

A servlet declaration may also include one or more initialization parameters, which are made available to the servlet through the ServletConfig object's getInitParameter() method:

<servlet>
    <servlet-name>helloclient1</servlet-name>
    <servlet-class>HelloClient</servlet-class>
    <init-param>
        <param-name>foo</param-name>
        <param-value>bar</param-value>
    </init-param>
</servlet>

Next, we have our <servlet-mapping>, which associates the servlet instance with a path on the web server. Servlet mapping entries appear in the web.xml file after all the servlet declaration entries:

<servlet-mapping>
    <servlet-name>helloclient1</servlet-name>
    <url-pattern>/hello</url-pattern>
</servlet-mapping>

Here we mapped our servlet to the path /hello. If we later name our WAR file learningjava.war and deploy it on www.oreilly.com, the full path to this servlet would be http://www.oreilly.com/learningjava/hello. Just as we could declare more than one servlet instance with the <servlet> tag, we could declare more than one <servlet-mapping> for a given servlet instance. We could, for example, redundantly map the same helloclient1 instance to the paths /hello and /hola. The <url-pattern> tag provides some very flexible ways to specify the URLs that should match a servlet. We'll talk about this in detail in the next section.

Finally, we should mention that although the web.xml example listed earlier will probably work on most application servers, it is technically incomplete because it is missing a formal header that specifies the version of XML it is using and the version of the web.xml file standard with which it complies.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
    "http://java.sun.com/dtd/web-app_2_3.dtd">

You can paste these four lines onto the beginning of each web.xml file we use in this book. You should do so, in fact, because with this information, if you make a mistake in the web.xml file, the web server can give you much better error messages.

14.11.2 URL Pattern Mappings

The <url-pattern> specified in the previous example was a simple string, "/hello". For this pattern, only an exact match ending in "/hello" would invoke our servlet. The <url-pattern> tag is capable of more powerful patterns, however, including wildcards. For example, specifying a <url-pattern> of "/hello*" allows our servlet to be invoked by URLs such as www.oreilly.com/learningjava/helloworld or ".../hellobaby". You can even specify wildcards with extensions, e.g., "*.html" or "*.foo", meaning that the servlet is invoked for any path that ends with those characters.

Using wildcards can result in more than one match. Consider, for example, the mappings "/scooby*" and "/scoobydoo*". Which should be matched for the URL ending with "../scooby"? What if we have a third possible match because of a wildcard suffix extension mapping? The rules for resolving these are as follows.

First, any exact match is taken. For example, "/hello" matches the "/hello" URL pattern in our example regardless of any additional "/hello*". Failing that, the container looks for the longest prefix match. So "/scoobydoobiedoo" matches the second pattern, "/scoobydoo*", because it is longer and presumably more specific. Failing any matches there, the container looks at wildcard suffix mappings. A request ending in ".foo" matches a "*.foo" mapping at this point in the process. Finally, failing any matches there, the container looks for a default, catchall mapping named "/*". A servlet mapped to "/*" picks up anything unmatched by this point. If there is no default servlet mapping, the request fails with a "404 not found" message.

14.11.3 Deploying HelloClient

Now let's deploy our HelloClient servlet. Once you've deployed the servlet, it should be easy to add examples to the WAR file as you work with them in this chapter. In this section, we'll show you how to build a WAR file by hand. In Section 14.16, we'll show a more realistic way to manage your applications using the wonderful tool, Ant. You can also grab the full set of examples, along with their source code, in the learningjava.war file on the CD-ROM that comes with this book (view CD content online at http://examples.oreilly.com/learnjava2/CD-ROM/).

To create the WAR file by hand, we first create the WEB/INF and WEB-INF/classes directories. Place web.xml into WEB-INF and HelloClient.class into WEB-INF/classes. Use the jar command to create learningjava.war:

$ jar cvf learningjava.war WEB-INF

You can also include some documents in the top level of this WAR file by adding their names after the WEB-INF directory above. This command produces the file learningjava.war. You can verify the contents using the jar command:

$ jar tvf learningjava.war

Now all that is necessary is to drop the WAR file into the correct location for your server. We assume you have downloaded and installed Tomcat. With Version 4.0 of Tomcat, the location for WAR files is the path for Tomcat, followed by /webapps. Place your WAR file here, and start the server. If Tomcat is configured with the default port number, you should be able to point to the HelloClient servlet with the following URLs: http://localhost:8080/learningjava/hello or http://<yourserver>:8080/learningjava/hello, where <yourserver> is the name or IP address of your server.

14.12 Reloading WebApps

All servers should provide a facility for automatically reloading WAR files and possibly individual servlet classes after they have been modified. This is part of the servlet specification and is especially useful during development. Unfortunately, support for this feature varies. BEA's WebLogic application server, when configured in development mode, allows you simply to replace the WAR file, and it handles redeployment. At the time of this writing, Tomcat is not quite so friendly. It supports reloading of servlet classes but does not handle reloading WAR files very well. Some servers, including the current version of Tomcat, "explode" WAR files by unpacking them into the webapps directory, or they allow you explicitly to configure a root directory for your WebApp. In this mode, they may allow you to replace individual files. After changing servlets or other classes, you can prompt Tomcat to reload the WebApp using a special URL with the format http://<yourserver>:8080/manager/reload?path=/learningjava. Even so, Tomcat does not currently reload the web.xml file. Until this situation improves, your safest bet is to remove this exploded directory, drop in your new WAR file, and restart the server (our apologies).

14.13 Error and Index Pages

One of the finer points of writing a professional-looking web application is taking care to handle errors well. Nothing annoys a user more than getting a funny-looking page with some technical mumbo-jumbo error information on it when they expected the receipt for their Christmas present. Through the web.xml file, it is possible to specify documents or servlets to handle error pages shown for various conditions, as well as the special case of index files (welcome files) for directories. Let's start with error handling.

You can designate a page or servlet that can handle various HTTP error status codes, such as "404 not found", "403 forbidden", etc., using one or more <error-page> declarations:

<web-app>
...
    <error-page>
         <error-code>404</error-code>
         <location>/notfound.html</location>
    </error-page>
    <error-page>
        <error-code>403</error-code>
        <location>/secret.html</location>
    </error-page>

Additionally, you can designate error pages based on Java exception types that may be thrown from the servlet. For example:

<error-page>
    <exception-type>java.lang.IOException</exception-type>
    <location>/ioexception.html</location>
</error-page>

This declaration catches any IOExceptions generated from servlets in the WebApp and displays the ioexception.html page. If no matching exceptions are found in the <error-page> declarations, and the exception is of type ServletException (or a subclass), the container makes a second try to find the correct handler. It looks for a wrapped exception (instigating exception) contained in the ServletException and attempts to match it to an error page declaration.

As we've mentioned, you can use a servlet to handle your error pages, just as you can use a static document. In fact, the container supplies several helpful pieces of information to an error-handling servlet, which the servlet can use in generating a response. The information is made available in the form of servlet request attributes through the method getAttribute():

Object attribute = servletRequest.getAttribute("name");

Attributes are like servlet parameters, except that they can be arbitrary objects. We have seen attributes of the ServletContext in a previous section. In this case, we are talking about attributes of the request. When a servlet (or JSP or filter) is invoked to handle an error condition, the following string attributes are set in the request:

javax.servlet.error.servlet_name
javax.servlet.error.request_uri
javax.servlet.error.message

Depending on whether the <error-page> declaration was based on an <error-code> or <exception-type> condition, the request also contains one of the following two attributes:

// status code Integer or Exception object
javax.servlet.error.status_code 
javax.servlet.error.exception

In the case of a status code, the attribute is an Integer representing the code. In the case of the exception type, the object is the actual instigating exception.

Index files can be designated in a similar way. Normally, when a user specifies a directory URL path, the web server searches for a default file in that directory to be displayed. The most common example of this is the ubiquitous index.html file. You can designate your own ordered list of files to look for by adding a <welcome-file-list> entry to your web.xml file. For example:

<welcome-file-list>
    <welcome-file>index.html</welcome-file>
    <welcome-file>index.htm</welcome-file>
</welcome-file-list>

<welcome-file-list> specifies that when a partial request (directory path) is received, the server should search first for a file named index.html and, if that is not found, a file called index.htm. If none of the specified welcome files is found, it is left up to the server to decide what kind of page to display. Servers are generally configured to display a directory-like listing or to produce an error message.

14.14 Security and Authentication

One of the most powerful features of WebApp deployment with the 2.3 Servlet API is the ability to define declarative security constraints. Declarative security means that you can simply spell out in the web.xml file exactly which areas of your WebApp (URL paths to documents, directories, servlets, etc.) are login-protected, the types of users allowed access to them, and the class of security protocol required for communications. It is not necessary to write code in your servlets to implement these basic security procedures.

There are two types of entries in the web.xml file that control security and authentication. First are the <security-constraint> entries, which provide authorization based on user roles and secure transport of data, if desired. Second is the <login-config> entry, which determines the kind of authentication used for the web application.

14.14.1 Assigning Roles to Users

Let's take a look at a simple example. The following web.xml excerpt defines an area called "My secret documents" with a URL pattern of /secure/* and designates that only users with the role "secretagent" may access them. It specifies the simplest form of login process: the BASIC authentication model, which causes the browser to prompt the user with a simple pop-up username and password dialog box.

<web-app>
...
    <security-constraint>
        <web-resource-collection>
            <web-resource-name>Secret documents</web-resource-name>
            <url-pattern>/secret/*</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <role-name>secretagent</role-name>
        </auth-constraint>
    </security-constraint>
  
    <login-config>
        <auth-method>BASIC</auth-method>
    </login-config>

The security constraint entry comes after all servlet and filter-related entries in the web.xml file. Each <security-constraint> block has one <web-resource-collection> section that designates a named list of URL patterns for areas of the WebApp, followed by an <auth-constraint> section listing user roles that are allowed to access those areas. You can add the example setup to the web.xml file for the learningjava.war file and prepare to try it out. However, there is one additional step you'll have to take to get this working: create the user role "secretagent" and an actual user with this role in your application server.

Access to protected areas is granted to user roles, not individual users. A user role is effectively just a group of users; instead of granting access to individual users by name, access is granted to roles, and users are assigned one or more roles. A user role is an abstraction from users. Actual user information (name and password, etc.) is handled outside the scope of the WebApp, in the application server environment (possibly integrated with the host platform operating system). Generally, application servers have their own tools for creating users and assigning individuals (or actual groups of users) their roles. A given username may have many roles associated with it.

When attempting to access a login-protected area, the user's valid login will be assessed to see if she has the correct role for access. For the Tomcat server, adding users and assigning them roles is easy; simply edit the file conf/tomcat-users.xml. To add a user named "bond" with the "secretagent" role, you'd add an entry like this:

<user name="bond" password="007" roles="secretagent"/>

For other servers, you'll have to refer to the documentation to determine how to add users and assign security roles.

14.14.2 Secure Data Transport

Before we move on, there is one more piece of the security constraint to discuss: the transport guarantee. Each <security-constraint> block may end with a <user-data-constraint> entry, which designates one of three levels of transport security for the protocol used to transfer data to and from the protected area. For example:

<security-constraint>
...
    <user-data-constraint>
        <transport-guarantee>CONFIDENTIAL</transport-guarantee>
    </user-data-constraint>
</security-constraint>

The three levels are NONE, INTEGRAL, and CONFIDENTIAL. NONE is equivalent to leaving out the section, indicating no special transport is required. This is the standard for normal web traffic, which is generally sent in plain text over the network. The INTEGRAL level of security specifies that any transport protocol used must guarantee the data sent is not modified in transit. This implies the use of digital signatures or some other method of validating the data at the receiving end but it does not require that the data be encrypted and hidden while it is transported. Finally, CONFIDENTIAL implies both INTEGRAL and encrypted. In practice, the only widely used secure transport used in web browsers is SSL. Requiring a transport guarantee other than NONE typically forces the use of SSL by the client browser.

14.14.3 Authenticating Users

The <login-conf> section determines exactly how a user authenticates (identifies) himself or herself to the protected area. The <auth-method> tag allows four types of login authentication to be specified: BASIC, DIGEST, FORM, and CLIENT-CERT. In our example, we showed the BASIC method, which uses the standard web browser login and password pop-up dialog. BASIC authentication sends the user's name and password in plain text over the Internet unless a transport guarantee has been used separately to start SSL and encrypt the data stream. DIGEST is a variation on BASIC that hides the text of the password but adds little real security; it is not widely used. FORM is equivalent to BASIC, but instead of using the browser's dialog, we are allowed to use our own HTML form or servlet to post the username and password data. Again, form data is sent in plain text unless otherwise protected by a transport guarantee (SSL). CLIENT-CERT is an interesting option. It specifies that the client must be identified using a client-side public key certificate. This implies the use of a protocol like SSL, which allows for secure exchange and mutual authentication using digital certificates.

The FORM method is most useful because it allows us to customize the look of the login page (we recommend using SSL to secure the data stream). We can also specify an error page to use if the authentication fails. Here is a sample <login-config> using the form method:

<login-config>
    <auth-method>FORM</auth-method>
    <form-login-config>
        <form-login-page>/login.html</form-login-page>
        <form-error-page>/login_error.html</form-error-page>
    </form-login-config>
</login-config>

The login page must contain an HTML form with a specially named pair of fields for the name and password. Here is a simple login.html file:

<html>
<head><title>Login</title></head>
<body>
    <form method="POST" action="j_security_check">
        Username: <input type="text" name="j_username"><br>
        Password: <input type="password" name="j_password"><br>
        <input type="submit" value="submit">
    </form>
</body>
</html>

The username field is called j_username, the password is called j_password, and the URL used for the form action attribute is j_security_check. There are no special requirements for the error page, but normally you will want to provide a "try again" message and repeat the login form.

14.14.4 Procedural Security

We should mention that in addition to the declarative security offered by the web.xml file, servlets may perform their own active procedural (or programmatic) security using all the authentication information available to the container. We won't cover this in detail, but here are the basics.

The name of the authenticated user is available through the HttpServletRequest getRemoteUser() method, and the type of authentication provided can be determined with the getAuthType() method. Servlets can work with security roles using the isUserInRole() method. (To do this requires adding some additional mappings in the web.xml file allowing the servlet to refer to the security roles by reference names). For advanced applications, a java.security.Principal object for the user can be retrieved with the getUserPrincipal() method. In the case where a secure transport like SSL was used, the method isSecure() returns true, and detailed information about the cipher type, key size, and certificate chain is made available through request attributes.

14.15 Servlet Filters

The servlet Filter API generalizes the Java Servlet API to allow modular component "filters" to operate on the server request and responses in a sort of pipeline. Filters are said to be chained, meaning that when more than one filter is applied, the servlet request is passed through each filter in succession, with each having an opportunity to act upon or modify the request before passing it to the next filter. Similarly, upon completion, the servlet result is effectively passed back through the chain on its return trip to the browser. Servlet filters may operate on any requests to a web application, not just those handled by the servlets; they may filter static content, as well.

Filters are declared and mapped to servlets in the web.xml file. There are two ways to map a filter: using a URL pattern like those used for servlets or by specifying a servlet by its instance name (<servlet-name>). Filters obey the same basic rules as servlets when it comes to URL matching, but when multiple filters match a path, they are all invoked.

The order of the chain is determined by the order in which matching filter mappings appear in the web.xml file, with <url-pattern> matches taking precedence over <servlet-name> matches. This is contrary to the way servlet URL matching is done, with specific matches taking the highest priority. Filter chains are constructed as follows. First, each filter with a matching URL pattern is called in the order in which it appears in the web.xml file; next, each filter with a matching servlet name is called, also in order of appearance. URL patterns take a higher priority than filters specifically associated with a servlet, so in this case, patterns such as /* have first crack at an incoming request.

The Filter API is very simple and mimics the Servlet API. A servlet filter implements the javax.servlet.Filter interface and implements three methods: init(), doFilter(), and destroy(). The doFilter() method is where the work is performed. For each incoming request, the ServletRequest and ServletResponse objects are passed to doFilter(). Here we have a chance to examine and modify these objects—or even substitute our own objects for them—before passing them to the next filter and ultimately the servlet (or user) on the other side. Our link to the rest of the filter chain is another parameter of doFilter(), the FilterChain object. With FilterChain, we can invoke the next element in the pipeline. The following section presents an example.

14.15.1 A Simple Filter

For our first filter, we'll do something easy but practical: create a filter that limits the number of connections to its URLs. We'll simply have our filter keep a counter of the active connections passing through it and turn away new requests when they exceed a specified limit.

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
  
public class ConLimitFilter implements Filter 
{
    int limit;
    int count;
  
    public void init( FilterConfig filterConfig )
        throws ServletException
    {
        String s = filterConfig.getInitParameter("limit");
        if ( s == null )
            throw new ServletException("Missing init parameter: "+limit);
        limit = Integer.parseInt( s );
    }
  
    public void doFilter ( 
        ServletRequest req, ServletResponse res, FilterChain chain ) 
            throws IOException, ServletException 
    {
        if ( count > limit ) {
            HttpServletResponse httpRes = (HttpServletResponse)res;
            httpRes.sendError( 
                httpRes.SC_SERVICE_UNAVAILABLE, "Too Busy.");
        } else {
            ++count;
            chain.doFilter( req, res );
            --count;
        }
    }
  
    public void destroy(  ) { }
}

ConLimitFilter implements the three life-cycle methods of the Filter interface: init(), doFilter(), and destroy(). In our init() method, we use the FilterConfig object to look for an initialization parameter named "limit" and turn it into an integer. Users can set this value in the section of the web.xml file where the instance of our filter is declared. The doFilter() method implements all our logic. First, it receives ServletRequest and ServletResponse object pairs for incoming requests. Depending on the counter, it then either passes them down the chain by invoking the next doFilter() method on the FilterChain object, or rejects them by generating its own response. We use the standard HTTP message "504 Service Unavailable" when we deny new connections.

Calling doFilter() on the FilterChain object continues processing by invoking the next filter in the chain or by invoking the servlet if ours is the last filter. Alternatively, when we choose to reject the call, we use the ServletResponse to generate our own response and then simply allow doFilter() to exit. This stops the processing chain at our filter, although any filters called before us still have an opportunity to intervene as the request effectively traverses back to the client.

Notice that ConLimitFilter increments the count before calling doFilter() and decrements it after. Prior to calling doFilter() is our time to work on the request before it reaches the rest of the chain and the servlet. After the call to doFilter(), the servlet has completed, and the request is, in effect, on the way back to the client. This is our opportunity to do any post-processing of the response. We'll discuss this a bit later.

Finally, we should mention that although we've been talking about the servlet request and response as if they were HttpServletRequest and HttpServletResponse, the doFilter() method actually takes the more generic ServletRequest and ServletResponse objects as parameters. As filter implementers, we are expected to determine when it is safe to treat them as HTTP traffic and perform the cast as necessary (which we do here in order to use the sendError() HTTP response method).

14.15.2 A Test Servlet

Before we go on, here is a simple test servlet you can use to try out this filter and the other filters we'll develop in this section. It's called WaitServlet, and as its name implies, it simply waits. You can specify how long it waits as a number of seconds with the servlet parameter time.

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
  
public class WaitServlet extends HttpServlet 
{
    public void doGet( HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException 
    {
        String waitStr = request.getParameter("time");
        if ( waitStr == null )
            throw new ServletException("Missing parameter: time");
        int wait = Integer.parseInt(waitStr);
  
        try {
            Thread.sleep( wait * 1000 );
        } catch( InterruptedException e ) { 
            throw new ServletException(e); 
        }
  
        response.setContentType("text/html");
        PrintWriter out = response.getWriter(  );
        out.println(
            "<html><body><h1>WaitServlet Response</h1></body></html>");
        out.close(  );
    }
}

By making multiple simultaneous requests to the WaitServlet, you can try out the ConLimitFilter. Be careful, though, because some web browsers (namely Opera) won't open multiple requests to the same URL. You may have to add extraneous parameters to trick the web browser. See the learningjava.war application on the CD-ROM that accompanies this book (view CD content online at http://examples.oreilly.com/learnjava2/CD-ROM/).

14.15.3 Declaring and Mapping Filters

Filters are declared and mapped in the web.xml file much as servlets are. Like servlets, one instance of a filter class is created for each filter declaration in the web.xml file. A filter declaration looks like this:

<filter>
    <filter-name>defaultsfilter1</filter-name>
    <filter-class>RequestDefaultsFilter</filter-class>
</filter>

It specifies a filter handle name to be used for reference within the web.xml file and the filter's Java class name. Filter declarations may also contain <init-param> parameter sections, just like servlet declarations.

Filters are mapped to resources with <filter-mapping> declarations that specify the filter handle name and either the specific servlet handle name or a URL pattern, as we discussed earlier.

<filter-mapping>
    <filter-name>conlimitfilter1</filter-name>
    <servlet-name>waitservlet1</servlet-name>
 </filter-mapping>
  
<filter-mapping>
    <filter-name>conlimitfilter1</filter-name>
    <url-pattern>/*</url-pattern>
 </filter-mapping>

Filter mappings appear after all filter declarations in the web.xml file.

14.15.4 Filtering the Servlet Request

Our first filter example was not very exciting because it did not actually modify any information going to or coming from the servlet. Next, let's do some actual "filtering" by modifying the incoming request before it reaches a servlet. In this example, we'll create a request "defaulting" filter that automatically supplies default values for specified servlet parameters when they are not provided in the incoming request. Despite its simplicity, this example might be very useful. Here is the RequestDefaultsFilter:

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
  
public class RequestDefaultsFilter implements Filter 
{
    FilterConfig filterConfig;
  
    public void init( FilterConfig filterConfig ) throws ServletException
    { 
        this.filterConfig = filterConfig;
    }
  
    public void doFilter ( 
        ServletRequest req, ServletResponse res, FilterChain chain ) 
            throws IOException, ServletException 
    {
        WrappedRequest wrappedRequest = 
            new WrappedRequest( (HttpServletRequest)req );
        chain.doFilter( wrappedRequest, res );
    }
  
    public void destroy(  ) { }
  
    class WrappedRequest extends HttpServletRequestWrapper 
    {
        WrappedRequest( HttpServletRequest req ) {
            super( req );
        }
  
        public String getParameter( String name ) { 
            String value = super.getParameter( name );
            if ( value == null )
                value = filterConfig.getInitParameter( name );
            return value;
        }
    }
}

To interpose ourselves in the data flow, we must do something drastic. We kidnap the incoming HttpServletRequest object and replace it with an evil twin that does our bidding. The technique, which we'll use here for modifying the request object and later for modifying the response, is to wrap the real request with an adapter, allowing us to override some of its methods. Here we will take control of the HttpServletRequest's getParameter() method, modifying it to look for default values where it would otherwise return null.

Again, we implement the three life-cycle methods of Filter, but this time, before invoking doFilter() on the filter chain to continue processing, we wrap the incoming HttpServletRequest in our own class, WrappedRequest. WrappedRequest extends a special adapter called HttpServletRequestWrapper. This wrapper class is a convenience utility that extends HttpServletRequest. It accepts a reference to a target HttpServletRequest object and, by default, delegates all of its methods to that target. This makes it very convenient for us to simply override one or more methods of interest to us. All we have to do is override getParameter() in our WrappedRequest class and add our functionality. Here we simply call our parent's getParameter(), and in the case where the value is null, we try to substitute a filter initialization parameter of the same name.

Try this example out using the WaitServlet with a filter declaration and mapping as follows:

<filter>
    <filter-name>defaultsfilter1</filter-name>
    <filter-class>RequestDefaultsFilter</filter-class>
    <init-param>
        <param-name>time</param-name>
        <param-value>3</param-value>
    </init-param>
</filter>
...
<filter-mapping>
    <filter-name>defaultsfilter1</filter-name>
    <servlet-name>waitservlet1</servlet-name>
 </filter-mapping>

Now the WaitServlet receives a default time value of three seconds even when you don't specify one.

14.15.5 Filtering the Servlet Response

Filtering the request was fairly easy, and we can do something similar with the response object using exactly the same technique. There is a corresponding HttpServletResponseWrapper that we can use to wrap the response before the servlet uses it to communicate back to the client. By wrapping the response, we can intercept methods that the servlet uses to write the response, just as we intercepted the getParameter() method that the servlet used in reading the incoming data. For example, we could override the sendError() method of the HttpServletResponse object and modify it to redirect to a specified page. In this way, we could create a servlet filter that emulates the programmable error page control offered in the web.xml file. But the most interesting technique available to us, and the one we'll show here, involves actually modifying the data written by the servlet before it reaches the client. To do this we have to pull a double "switcheroo." We wrap the servlet response to override the getWriter() method and then create our own wrapper for the client's PrintWriter object supplied by this method, one that buffers the data written and allows us to modify it. This is a useful and powerful technique, but it can be tricky.

Our example is called LinkResponseFilter. It is an automatic hyperlink-generating filter that reads HTML responses and searches them for patterns supplied as regular expressions. When it matches a pattern, it turns it into an HTML link. The pattern and links are specified in the filter initialization parameters. You could extend this example with access to a database or XML file and add additional rules to make it into a very powerful site management helper. Here it is:

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;
  
public class LinkResponseFilter implements Filter 
{
    FilterConfig filterConfig;
  
    public void init( FilterConfig filterConfig ) 
        throws ServletException 
    { 
        this.filterConfig = filterConfig;
    }
  
    public void doFilter ( 
        ServletRequest req, ServletResponse res, FilterChain chain ) 
            throws IOException, ServletException 
    {
        WrappedResponse wrappedResponse = 
            new WrappedResponse( (HttpServletResponse)res );
        chain.doFilter( req, wrappedResponse );
        wrappedResponse.close(  );
    }
  
    public void destroy(  ) { }
  
    class WrappedResponse extends HttpServletResponseWrapper 
    {
        boolean linkText;
        PrintWriter client;
  
        WrappedResponse( HttpServletResponse res ) {
            super( res );
        }
  
        public void setContentType( String mime ) {
            super.setContentType( mime );
            if ( mime.startsWith("text/html") )
                linkText = true;
        }
  
        public PrintWriter getWriter(  ) throws IOException {
            if ( client == null )
                if ( linkText )
                    client = new LinkWriter( 
                        super.getWriter(), new ByteArrayOutputStream(  ) );
                else
                    client = super.getWriter(  );
            return client;
        }
  
        void close(  ) {
            if ( client != null )
                client.close(  );
        }
    }
  
    class LinkWriter extends PrintWriter
    {
        ByteArrayOutputStream buffer;
        Writer client;
  
        LinkWriter( Writer client, ByteArrayOutputStream buffer ) {
            super( buffer );
            this.buffer = buffer;
            this.client = client;
        }
  
        public void close(  ) {
            try {
                flush(  );
                client.write( linkText( buffer.toString(  ) ) );
                client.close(  );
            } catch ( IOException e ) { 
                setError(  );
            }
        }
  
        String linkText( String text ) {
            Enumeration en = filterConfig.getInitParameterNames(  ); 
            while ( en.hasMoreElements(  ) ) {
                String pattern = (String)en.nextElement(  );
                String value = filterConfig.getInitParameter( pattern );
                text = text.replaceAll( 
                    pattern, "<a href="+value+">$0</a>" );
            }
            return text;
        }
    }
}

That was a bit longer than our previous examples, but the basics are the same. We have wrapped the HttpServletResponse object with our own WrappedResponse class using the HttpServletResponseWrapper helper class. Our WrappedResponse overrides two methods: getWriter() and setContentType(). We override setContentType() in order to set a flag indicating whether the output is of type "text/html" (an HTML document). We don't want to be performing regular-expression replacements on binary data such as images, for example, should they happen to match our filter. We also override getWriter() to provide our substitute writer stream, LinkWriter. Our LinkWriter class is a PrintStream that takes as arguments the client PrintWriter and a ByteArrayOutputStream that serves as a buffer for storing output data before it is written. We are careful to substitute our LinkWriter only if the linkText boolean set by setContent() is true. When we do use our LinkWriter, we cache the stream so that any subsequent calls to getWriter() return the same object. Finally, we have added one method to the response object: close(). A normal HttpServletResponse does not have a close() method. We use ours on the return trip to the client to indicate that the LinkWriter should complete its processing and write the actual data to the client. We do this in case the client does not explicitly close the output stream before exiting the servlet service methods.

This explains the important parts of our filter-writing example. Let's wrap up by looking at the LinkWriter, which does the magic in this example. LinkWriter is a PrintStream that holds references to two other Writers: the true client PrintWriter and a ByteArrayOutputStream . The LinkWriter calls its superclass constructor, passing the ByteArrayOutputStream as the target stream, so all of its default functionality (its print() methods) writes to the byte array. Our only real job is to intercept the close() method of the PrintStream and add our text linking before sending the data. When LinkWriter is closed, it flushes itself to force any data buffered in its superclass out to the ByteArrayOutputStream. It then retrieves the buffered data (with the ByteArrayOutputStream toString() method) and invokes its linkText() method to create the hyperlinks before writing the linked data to the client. The linkText() method simply loops over all the filter initialization parameters, treating them as patterns, and uses the String replaceAll() method to turn them into hyperlinks. (See Chapter 9 for more about replaceAll()).

This example works, but it has limitations. First, we cannot buffer an infinite amount of data. A better implementation would have to make a decision about when to start writing data to the client, potentially based on the client-specified buffer size of the HttpServletResponse API. Next, our implementation of linkText() could probably be speeded up by constructing one large regular expression using alternation. You will no doubt find other ways it can be improved.

14.16 Building WAR Files with Ant

Thus far in this book, we have not become too preoccupied with special tools to help you construct Java applications. Partly, this is because it's outside the scope of this text, and partly it reflects a small bias of the authors against getting too entangled with particular development environments. There is, however, one universal tool that should be in the arsenal of every Java developer: the Jakarta Project's Ant. Ant is a project builder for Java, a pure Java application that fills the role that make does for C applications. Ant has many advantages over make when building Java code, not the least of which is that it comes with a wealth of special "targets" (declarative commands) to perform common Java-related operations such as building WAR files. Ant is fast, portable, and easy to install and use. Make it your friend.

We won't cover the usage of Ant in any detail here. You can learn more and download it from its home page, http://jakarta.apache.org/ant/ or grab it from the CD-ROM accompanying this book (view CD content online at http://examples.oreilly.com/learnjava2/CD-ROM/). We give you a sample build file here to get you started.

14.16.1 A Development-Oriented Directory Layout

At the beginning of this chapter, we described the layout of a WAR file, including the standard files and directories that must appear inside the archive. While this file organization is necessary for deployment inside the archive, it may not be the best way to organize your project during development. Maintaining web.xml and libraries inside a directory named WEB-INF under all of your content may be convenient for running the jar command, but it doesn't line up well with how those areas are created or maintained from a development perspective. Fortunately, with a simple Ant build file, we can create our WAR from an arbitrary project layout.

Let's choose a directory structure that is a little more oriented towards project development. For example:

myapplication
|
|-- src
|-- lib
|-- docs
|-- web.xml

We place our source-code tree under src, required library JAR files under lib, and our content under docs. We leave web.xml at the top where it's easy to tweak parameters, etc.

Here is a simple Ant build.xml file for constructing a WAR file from the new directory structure:

<project name="myapplication" default="compile" basedir=".">
  
    <property name="war-file" value="${ant.project.name}.war"/>
    <property name="src-dir" value="src" />
    <property name="build-dir" value="classes" />
    <property name="docs-dir" value="docs" />
    <property name="webxml-file" value="web.xml" />
    <property name="lib-dir" value="lib" />
  
    <target name="compile" depends="">
        <mkdir dir="${build-dir}"/>
        <javac srcdir="${src-dir}" destdir="${build-dir}"/>
    </target>
  
    <target name="war" depends="compile">
        <war warfile="${war-file}" webxml="${webxml-file}">
            <classes dir="${build-dir}"/>
            <fileset dir="${docs-dir}"/>
            <lib dir="${lib-dir}"/>
        </war>
    </target>
  
    <target name="clean">
        <delete dir="${build-dir}"/>
        <delete file="${war-file}"/>
    </target>
  
</project>

Place the build.xml file in the myapplication directory. You can now compile your code simply by running ant, or you can compile and build the WAR file with the command ant war. Ant automatically finds all the Java files under the src tree that need building and compiles them into a "build" directory named classes. Running antwar creates the file myapplication.war. You can remove the build directory and WAR file with the ant clean command.

There is nothing really project-specific in this sample build file except the project name attribute in the first line, which you replace with your application's name. And we reference that only to specify the name of the WAR file to generate. Feel free to customize the names of any of the files or directories by changing the property declarations at the top. The learningjava.war file example supplied on the accompanying CD-ROM (view CD content online at http://examples.oreilly.com/learnjava2/CD-ROM/) comes with a version of this Ant build.xml file.

CONTENTS