Book HomeJava and XSLTSearch this book

9.2. URL Encoding

Before data supplied on a form can be sent to a CGI program, each form element's name (specified by the name attribute) is equated with the value entered by the user to create a key/value pair. For example, if the user entered "30" when asked for his or her age, the key/value pair would be "age=30". In the transferred data, key/value pairs are separated by the ampersand (&) character.

Since under the GET method the form information is sent as part of the URL, form information can't include any spaces or other special characters that are not allowed in URLs, and also can't include characters that have other meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is being used.) Therefore, the web browser performs some special encoding on user-supplied information.

Encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents. (Thus, URL encoding is also sometimes called hexadecimal encoding.) Suppose a user fills out and submits a form containing his or her birthday in the syntax mm/dd/yy (e.g., 11/05/73). The forward slashes in the birthday are among the special characters that can't appear in the client's request for the CGI program. Thus, when the browser issues the request, it encodes the data. The following sample request shows the resulting encoding:

POST /cgi-bin/birthday.pl HTTP/1.0
Content-length: 21

birthday=11%2F05%2F73

The sequence %2F is actually the hexadecimal equivalent of the slash character.

CGI scripts have to provide some way to "decode" the form data that the client has encoded. The best way to do this is to use CGI.pm (covered in Chapter 10, "The CGI.pm Module") and let someone else do the work for you.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.