8.4 Runtime: Securing CGI ScriptsWe've secured what we can at build time. Now we enter a maze of twisty little passages, seeking security at runtime. 8.4.1 HTTP, URLs, and CGIJust as a little SMTP knowledge aids understanding of email-security issues, a little background on HTTP and URLs improves knowledge of web security. Every exchange between a web client and server is defined by the Hypertext Transfer Protocol (HTTP). HTTP 1.0 was the first widely used version, but it had some shortcomings. Most of these were addressed with HTTP 1.1, the current version that is almost universal. HTTP 1.1 is defined in RFC 2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html). The web client makes HTTP requests, and the web server responds. Web browsers hide much of the data exchange, such as MIME types, cache settings, content negotiation, timestamps, and other details. Other clients (such as a web spider, wget, or curl) offer much more control over the exchange. An HTTP request contains an initial request line: Method URI HTTP-Version \r\n Methods include OPTIONS, GET, HEAD, POST, PUT, TRACE, DELETE, and CONNECT. Some methods have a corresponding URL format. This line may be followed by request header lines containing information about the client, the host, authorization, and other things. These lines may be followed by a message body. The web server returns a header and an optional body, depending on the request. There are security implications with the type of URLs you use. Since the protocol is text, it's easy to forge headers and bodies (although attackers have successfully forged binary data for years). You can't trust what you're being told, whether you're a web server or a client. See section 15 of RFC 2616 for other warnings. The following are the most common methods and some security implications. 8.4.1.1 HEAD methodDo you want to know what web server someone is running? It's easy. Let's look at the HEAD data for the home page at http://www.apache.org: $ telnet www.apache.org 80 Trying 63.251.56.142... Connected to daedalus.apache.org (63.251.56.142). Escape character is '^]'. HEAD / HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Date: Sat, 13 Apr 2002 03:48:58 GMT Server: Apache/2.0.35 (Unix) Cache-Control: max-age=86400 Expires: Sun, 14 Apr 2002 03:48:58 GMT Accept-Ranges: bytes Content-Length: 7790 Content-Type: text/html Connection closed by foreign host. $ (A handy alternative to this manual approach is the curl client, available from http://www.haxx.se.) The actual responses vary by web server and site. Some don't return a Server: response header, or say they're something else, to protect against attacks aided by port 80 fingerprinting. The default value returned by Apache includes the identity of many modules. To return only a Server: Apache response, specify: ServerTokens ProductOnly 8.4.1.2 OPTIONS methodIf OPTIONS is supported, it tells us more about the web server: $ telnet www.apache.org 80 Trying 63.251.56.142... Connected to daedalus.apache.org (63.251.56.142). Escape character is '^]'. OPTIONS * HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Date: Sat, 13 Apr 2002 03:57:10 GMT Server: Apache/2.0.35 (Unix) Cache-Control: max-age=86400 Expires: Sun, 14 Apr 2002 03:57:10 GMT Allow: GET,HEAD,POST,OPTIONS,TRACE Content-Length: 0 Content-Type: text/plain Connection closed by foreign host. $ The OPTIONS method is not a security concern, but you might like to try it on your own servers to see what it returns. 8.4.1.3 GET methodGET is the standard method for retrieving data from a web server. A URL for the GET method may be simple, like this call for a home page: http://www.hackenbush.com/ A GET URL may be extended with a ? and name=value arguments. Each instance of name and value is URL encoded, and pairs are separated by an &: http://www.hackenbush.com/cgi-bin/groucho.pl?day=jan%2006&user=zeppo An HTTP GET request contains a header but no body. Apache handles the request directly, assigning everything after the ? to the QUERY_STRING environment variable. Since all the information is in the URL itself, a GET URL can be bookmarked, or repeated from the browser, without resubmitting a form. It can also be generated easily by client-side or server-side scripting languages. Although you may see some very long and complex GET URLs, web servers may have size limits that would snip your URL unceremoniously (ouch). Apache guards against GET buffer overflow attacks, but some other web servers and web cache servers have not. Since all the parameters are in the URL, they also appear in the web-server logs. If there is any sensitive data in the form, a POST URL should be used. The question mark and /cgi-bin advertise that this URL calls a CGI script called groucho.pl. You may want the benefits of a GET URL without letting everyone know that this is a CGI script. If an attacker knows you're using Perl scripts on Apache, for instance, he can target his attack more effectively. Another reason involves making the URL more search-engine friendly. Many web search engines skip URLs that look like CGI scripts. One technique uses the PATH_INFO environment variable and Apache rewriting rules. You can define a CGI directory with a name that looks like a regular directory: ScriptAlias /fakedir/ "/usr/local/apache/real_cgi_bin/" Within this directory you could have a CGI script called whyaduck. When this URL is received: http://www.hackenbush.com/fakedir/whyaduck/day/jan%2006/user/zeppo Apache will execute the CGI script /var/www/real-cgi-bin/whyaduck and pass it the environment variable PATH_INFO with the value /day/jan 06/user/zeppo. Your script can parse the components with any method you like (use split in Perl or explode in PHP to split on the slashes). Since GET requests are part of the URL, they may be immortalized in server logs, bookmarks, and referrals. This may expose confidential information. If this is an issue, use POST rather than GET. If you don't specify the method attribute for a <form> tag in HTML, it uses GET. 8.4.1.4 POST methodPOST is used to send data to a CGI program on the web server. A URL for the POST method appears bare, with no ? or encoded arguments. URL-encoded data is sent in the HTTP body to Apache, then from Apache to the standard input of the CGI program. A user must resubmit her original form and data to refresh the output page, since the recipient has no way of knowing if the data may have changed. (With a GET URL, everything's in the URL.) The data size is not as limited as with GET. Normally POST data is not logged, although you can configure Apache to do so. A POST URL cannot be bookmarked, and it cannot be automatically submitted from a browser without using client-side JavaScript (other clients like wget and curl can submit POST requests directly). You need to have a button or other link with a JavaScript URL that submits a form that is somewhere on your page. 8.4.1.5 PUT methodThis was the original HTTP upload mechanism. Specify a CGI script to handle a PUT request, as you would for a POST request. PUT seems to have been superceded by WebDAV and other methods, which are described in Section 8.5.5. 8.4.2 CGI LanguagesAny language can be a CGI language just by following the CGI specification. An HTTP response requires at least an initial MIME type line, a blank, and then content. Here's a minimal CGI script written in the shell: #!/bin/sh echo "Content-type: text/html" echo echo "Hello, world" Technically, we should terminate the first two echo lines with a carriage return-line feed pair ('\r\n\r\n'), but browsers know what to do with bare Unix-style line feeds. Although a C program might run faster than a shell or Perl equivalent, CGI startup time tends to outweigh that advantage. I feel that the best balance of flexibility, performance, and programmer productivity lies with interpreted languages running as Apache modules. The top languages in that niche are PHP and Perl. I'll discuss the security trouble spots to watch, with examples from Perl and PHP:
But first, a few words about Perl and PHP. 8.4.2.1 PHPPHP is a popular web-scripting language for Unix and Windows. It's roughly similar to, and competes with, Visual BASIC and ASP on Windows. On Unix and Linux, it competes with Perl and Java. Its syntax is simpler than Perl's, and its interpreter is small and fast.
PHP code is embedded in HTML and distinguished by any of these start and end tags: <?php ... ?> <? ... ?> <% ... %> PHP files can contain any mixture of normal HTML and PHP, like this: <? echo "<b>string<b> = <I>$string</I>\n"; ?> or more compactly: <b>string</b> = <i><?=$string?></i> PHP configuration options can be specified in three ways:
8.4.2.2 PerlPerl is the mother of all web-scripting languages. The most popular module for CGI processing, CGI.pm, is part of the standard Perl release. Here's a quick Perl script to get the value of a form variable (or handcrafted GET URL) called string: #!/usr/bin/perl -w use strict; use CGI qw(:standard); my $string = param("string"); echo header; echo "<b>string</b> = <I>$string</I>\n"; A Perl CGI script normally contains a mixture of HTML print statements and Perl processing statements. 8.4.3 Processing Form DataIn the previous examples, I showed how to get and echo the value of the form value string. I'll now show how to circumvent this simple code, and how to protect against the circumvention. Client-side form checking with JavaScript is a convenience for the user, and it avoids a round-trip to the server to load a new page with error messages. However, it does not protect you from a handcrafted form submission with bad data. Here's a simple form that lets the web user enter a text string: <form name="user_form" method="post" action="/cgi-bin/echo"> <input type="text" name="string"> <input type="submit" value="submit"> </form> When submitted, we want to echo the string. Let's look again at a naïve stab at echo in PHP: <? echo "string = $string\n"; ?> And the same in Perl: #!/usr/bin/perl -w use strict; use CGI qw(:standard); print header; print "string = ", param("string"), "\n"; This looks just ducky. In fact, if you type quack into the string field, you see the output: string = quack But someone with an evil mind might enter this text into the string field: <script language=javascript>history.go(-1);</script> Submit this, and watch it bounce right back to your input form. If this form did something more serious than echo its input (such as entering the contents of string into a database), the results could be more serious.
This is an example of someone uploading code to your server without your knowledge and then getting it to download and execute on any browser. This cross-site scripting bug was fixed within JavaScript itself some time ago, but that doesn't help in this case, since JavaScript is being injected into the data of a server-side script. HTML tags that invoke active content are shown in Table 8-6. Each scripting language has the ability to escape input data, removing any magic characters, quotes, callouts, or anything else that would treat the input as something other than plain text. An even better approach is to specify what you want, rather than escaping what you don't want. Match the data against a regular expression of the legal input patterns. The complexity of the regular expression would depend on the type of data and the desired level of validity checking. For example, you might want to ensure that a U.S. phone number field has exactly 13 digits or that an email address follows RFC 822. 8.4.3.1 PHPTo avoid interpreting a text-form variable as JavaScript or HTML, escape the special characters with the PHP functions htmlspecialcharacters or htmlentities. As mentioned previously, it's even better to extract the desired characters from the input first via a regular-expression match. In the following section, there's an example of how Perl can be used to untaint input data. PHP has had another security issue with global data. When the PHP configuration variable register_globals is enabled, PHP creates an automatic global variable to match each variable in a submitted form. In the earlier example, a PHP variable named $string winks into existence to match the form variable string. This makes form processing incredibly easy. The problem is that anyone can craft a URL with such variables, forging a corresponding PHP variable. So any uninitialized variable in your PHP script could be assigned from the outside. The danger is not worth the convenience. Specify register_globals off in your php.ini file. Starting with PHP 4.1.2, this is the default setting. PHP Versions 4.1.1 and up also provide safer new autoglobal arrays. These are automatically global within PHP functions (in PHP, you need to say global var within a PHP function to access the normal global variable named var; this quirk always bites Perl developers). These arrays should be used instead of the older arrays $HTTP_GET_VARS and $HTTP_POST_VARS and are listed in Table 8-7.
Another new autoglobal array, $_REQUEST, is the union of $_GET, $_POST, and $_COOKIE. This is handy when you don't care how the variable got to the server. 8.4.3.2 Perl
This mode marks data originating outside the script as potentially unsafe and forces you to do something about it. To untaint a variable, run it through a regular expression, and grab it from one of the positional match variables ($1, $2, ...). Here's an example that gets a sequence of "word" characters (\w matches letters, digits, and _): #!/usr/bin/perl -wT use strict; use CGI qw(:standard); my $user = param("user"); if ($user =~ /^(\w+)$/) { $user = $1; } We'll see that taint mode applies to file I/O, program execution, and other areas where Perl is reaching out into the world. 8.4.4 Including FilesCGI scripts can include files inside or outside of the document hierarchy. Try to move sensitive information from your scripts to files located outside the document hierarchy. This is one layer of protection if your CGI script somehow loses its protective cloak and can be viewed as a simple file. Use a special suffix for sensitive include files (a common choice is .inc), and tell Apache not to serve files with that suffix. This will protect you when you accidentally put an include file somewhere in the document root. Add this to an Apache configuration file: <FilesMatch ~ /\.inc$/> order allow, deny deny from all </Files> Also, watch out for text editors that may leave copies of edited scripts with suffixes like ~ or .bak. The crafty snoop could just ask your web server for files like program~ or program.bak. Your access and error logs will show if anyone has tried. To forbid serving them anywhere, add this to your Apache configuration file: <FilesMatch ~ /(~,\.bak)$/> order allow, deny deny from all </Files> When users are allowed to view or download files based on a submitted form variable, guard against attempts to access sensitive data, such as a password file. One exploit is to use relative paths (..): ../../../etc/passwd Cures for this depend on the language and are described in the following sections. 8.4.4.1 PHPExternal files can be included with the PHP include or include_once commands. These may contain functions for database access or other sensitive information. A mistake in your Apache configuration could expose PHP files within normal document directories as normal text files, and everyone could see your code. For this reason, I recommend the following:
Use the basename function to isolate the filename from the directory and open_basedir to restrict access to a certain directory. These will catch attempts to use ../ relative filenames. If you process forms where people request a file and get its contents, you need to watch the PHP file-opening command fopen and the file-reading commands fpassthru and readfile. fopen and readfile accept URLs as well as filenames; disable this with allow_url_fopen=false in php.ini. You may also limit PHP file operations to a specific directory with the open_basedir directive. This can be set within Apache container directives to limit virtual hosts to their backyards: <VirtualHost 192.168.102.103> ServerName a.test.com DocumentRoot /usr/local/apache/hosts/a.test.com php_admin_value open_basedir /usr/local/apache/hosts/a.test.com </VirtualHost> If safe_mode is enabled in php.ini or an Apache configuration file, a file must be owned by the owner of the PHP script to be processed. This is also useful for virtual hosts. Table 8-8 lists recommended safe settings for PHP.
In Table 8-8, I'm assuming you might set up a directory for each virtual host under /usr/local/apache/host. You can specify multiple directories with a colon (:) separator. 8.4.4.2 PerlIn taint mode, Perl blocks use of the functions eval, require, open (except read-only mode), chdir, chroot, chmod, unlink, mkdir, rmdir, link, and symlink. You must untaint filenames before using any of these. As in the PHP example, watch for relative (../) names and other attempts to access files outside the intended area. 8.4.5 Executing ProgramsMost scripting languages let you run external programs. This is a golden opportunity for nasty tricks. Check the pathname and remove any metacharacters that would allow multiple commands. Avoid passing commands through a shell interpreter. 8.4.5.1 PHPEscape any possible attempts to slip in extra commands with this PHP function: $safer_input = escapeshellarg($input); system("some_command $safer_input"); or: escapeshellcmd("some_command $input"); These PHP functions invoke the shell and are vulnerable to misuse of shell metacharacters: system, passthru, exec, popen, preg_replace (with the /e option), and the backtick (`command`) operator. If safe_mode is set, only programs within safe_mode_exec_dir can be executed, and only files owned by the owner of the PHP script can be accessed. The PHP function eval($arg) executes its argument $arg as PHP code. There's no equivalent to safe_mode for this, although the disable_functions option lets you turn off selected functions. Don't execute user data. 8.4.5.2 PerlTaint mode will not let you pass unaltered user input to the functions system, exec, eval, or the backtick (`command`) operator. Untaint them before executing, as described earlier. 8.4.6 Uploading Files from FormsRFC 1867 documents form-based file uploads — a way of uploading files through HTML, HTTP, and a web server. It uses an HTML form, a special form-encoding method, and an INPUT tag of type FILE: <form method="post" enctype="multipart/form-data" action="/cgi-bin/process_form.php"> <input type="text" name="photo_name"> <input type="file" name="upload"> <input type="submit" value="submit"> </form> This is another golden opportunity for those with too much time and too little conscience. A file upload is handled by a CGI file-upload script. There is no standard script, since so many things can be done with an uploaded file. 8.4.6.1 PHPUploaded files are saved as temporary files in the directory specified by the PHP directive upload_tmp_dir. The default value (/tmp) leaves them visible to anyone, so you may want to define upload_tmp_dir to some directory in a virtual host's file hierarchy. To access uploaded files, use the new autoglobal array $_FILES, which is itself an array. For the photo-uploading example, let's say you want to move an uploaded image to the photos directory of virtual host host: <? // $name is the original file name from the client $name = $_FILES['photo_file']['name']; // $type is PHP's guess of the MIME type $type = $_FILES['photo_file']['type']; // $size is the size of the uploaded file (in bytes) $size = $_FILES['photo_file']['size']; // $tmpn is the name of the temporary uploaded file on the server $tmpn = $_FILES['photo_file']['tmp_name']; // If everything looks right, move the temporary file // to its desired place. if (is_uploaded_file($tmpn)) move_uploaded_file($tmpn, "/usr/local/apache/host/photos"); You may check the file's type, name, and size before deciding what to do with it. The PHP option max_upload_filesize caps the size; if a larger file is uploaded, the value of $tmpn is none. When the PHP script finishes, any temporary uploaded files are deleted. 8.4.6.2 PerlThe CGI.pm module provides a file handle for each temporary file. #!/usr/bin/perl -wT use strict; use CGI qw(:standard); my $handle = param("photo_file"); my $tmp_file_name = tmpFileName($handle); # Copy the file somewhere, or rename it # ... The temporary file goes away when the CGI script completes. 8.4.7 Accessing DatabasesAlthough relational databases have standardized on SQL as a query language, many of their APIs and interfaces, whether graphic or text based, have traditionally been proprietary. When the Web came along, it provided a standard GUI and API for static text and dynamic applications. The simplicity and broad applicability of the web model led to the quick spread of the Web as a database frontend. Although HTML does not offer the richness and performance of other graphic user interfaces, it's good enough for many applications. Databases often contain sensitive information, such as people's names, addresses, and financial data. How can a porous medium like the Web be made safer for database access?
Any time information is exchanged, someone will be tempted to change it, block it, or steal it. We'll quickly review these issues in PHP and Perl database CGI scripts:
8.4.7.1 PHPPHP has many specific and generic database APIs. There is not yet a clear leader to match Perl's DBI. A PHP fragment to access a MySQL database might begin like this: <? $link = mysql_connect("db.test.com", "dbuser", "dbpassword"); if (!$link) echo "Error: could not connect to database\n"; ?> If this fragment is within every script that accesses the database, every instance will need to be changed if the database server, user, or password changes. More importantly, a small error in Apache's configuration could allow anyone to see the raw PHP file, which includes seeing these connection parameters. It's easier to write a tiny PHP library function to make the connection, put it in a file outside the document root, and include it where needed. Here's the include file: // my_connect.inc // PHP database connection function. // Put this file outside the document root! // Makes connection to database. // Returns link id if successful, false if not. function my_connect( ) { $database = "db.test.com"; $user = "db_user"; $password = "db_password"; $link = mysql_connect($database, $user, $password); return $link; } And this is a sample client: // client.php // PHP client example. // Include path is specified in include_path in php.ini. // You can also specify a full pathname. include_once "my_connect.inc"; $link = my_connect( ); // Do error checking in client or library function if (!$link) echo "Error: could not connect to database\n"; // ... Now that the account name and password are better protected, you need to guard against malicious SQL code. This is similar to protecting against user input passing directly to a system command, for much the same reasons. Even if the input string is harmless, you still need to escape special characters. The PHP addslashes function puts a backslash (\) before these special SQL characters: single quote ('), double quote ("), backslash (\), and NUL (ASCII 0). This will be called automatically by PHP if the option magic_quotes_gpc is on. Depending on your database, this may not quote all the characters correctly. SQL injection is an attempt to use your database server to get access to otherwise protected data (read, update, or delete) or to get to the operating system. For an example of the first case, say you have a login form with user and password fields. A PHP script would get these form values (from $_GET, $_POST, or $_REQUEST, if it's being good), and then build a SQL string and make its query like this: $sql = "SELECT COUNT(*) FROM users WHERE\n" . "user = '$user' AND\n". "password = '$password'"; $result = mysql_query($sql); if ($result && $row = mysql_fetch_array($result) && $row[0] == 1) return true; else return false; An exploiter could enter these into the input fields (see Table 8-9).
The SQL string would become: SELECT COUNT(*) FROM users WHERE user = '' OR '' = '' AND password = '' OR '' = '' The door is now open. To guard against this, use the techniques I've described for accessing other external resources, such as files or programs: escape metacharacters and perform regular-expression searches for valid matches. In this example, a valid user and password might be a sequence of letters and numbers. Extract user and password from the original strings and see if they're legal. In this example, if the PHP option magic_quotes_gpc were enabled, this exploit would not work, since all quote characters would be preceded by a backslash. But other SQL tricks can be done without quotes. A poorly written script may run very slowly or even loop forever, tying up an Apache instance and a database connection. PHP's set_time_limit function limits the number of seconds that a PHP script may execute. It does not count time outside the script, such as a database query, command execution, or file I/O. It also does not give you more time than Apache's Timeout variable. 8.4.7.2 PerlPerl has the trusty database-independent module DBI and its faithful sidekicks, the database-dependent (DBD) family. There's a DBD for many popular databases, both open source (MySQL, PostgreSQL) and commercial (Oracle, Informix, Sybase, and others). A MySQL connection function might resemble this: # my_connect.pl sub my_connect { my $server = "db.test.com"; my $db = "db_name"; my $user = "db_user"; my $password = "db_password"; my $dbh = DBI->connect( "DBI:mysql:$db:$server", $user $password, { PrintError => 1, RaiseError => 1 }) or die "Could not connect to database $db.\n"; return $dbh; } 1; As in the PHP examples, you'd rather not have this function everywhere. Perl has, characteristically, more than one way to do it. Here is a simple way: require "/usr/local/myperllib/my_connect.pl"; If your connection logic is more complex, it could be written as a Perl package or a module. Taint mode won't protect you from entering tainted data into database queries. You'll need to check the data yourself. Perl's outstanding regular-expression support lets you specify patterns that input data must match before going into a SQL statement. 8.4.8 Checking Other ScriptsOnce you've secured Apache and your own scripts, don't forget to check any other old scripts that may be lying around. Some demo scripts and even commercial software have significant holes. I suggest disabling or removing any CGI scripts if you aren't certain about them. whisker (http://www.wiretrip.net/rfp/p/doc.asp/i2/d21.htm) is a Perl script that checks for buggy CGI scripts against a vulnerability database. 8.4.9 Continuing CareCheck your error_log regularly for bad links, attacks, or other signs of trouble. You are sure to see many IIS-specific exploit attempts such as Code Red and Nimda, but someone might actually be targeting a LAMP component. |