3.2 Preventing Attacks on Formatting Functions
3.2.1 Problem
You use functions such as
printf( ) or syslog(
) in
your program, and you want to ensure that you use them in such a way
that an attacker cannot coerce them into behaving in ways that you do
not intend.
3.2.2 Solution
Functions such as the printf( ) family of
functions provide a flexible and powerful way to format data easily.
Unfortunately, they can be extremely dangerous as well. Following the
guidelines outlined in the following
Section 3.2.3 will allow you
to easily avert many of the problems with these functions.
3.2.3 Discussion
The printf( ) family of functions—and other
functions that use them, such as syslog( ) on Unix
systems—all require an argument that specifies a format, as
well as a variable number of additional arguments that are
substituted at various locations in the format string to produce
formatted output. The functions come in two major varieties:
Both can be dangerous, but the latter variety is significantly more
so.
The format string is copied, character by character, until a percent
(%) symbol is encountered. The
characters that immediately follow the percent symbol determine what
will be output in their place. For each substitution in the format
string, the next argument in the variable argument list is used.
Because of the way that variable-sized argument lists work in C (see
Recipe 13.4), the functions assume that the number of arguments
present in the argument list is equal to the number of substitutions
required by the format string. The GCC compiler in particular will
recognize calls to the functions in the printf( )
family, and it will emit warnings if it detects data type mismatches
or an incorrect number of arguments in the variable argument list.
If you adhere to the following guidelines when using the
printf( ) family of functions, you can be
reasonably certain that you are using the functions safely:
- Beware of the "%n" substitution.
-
All but one of the substitutions recognized by the printf(
) family of functions use arguments from the variable
argument list as data to be substituted into the output. The lone
exception is "%n", which writes the
number of bytes written to the output buffer or file into the memory
location pointed to by the next argument in the argument list.
While the "%n" substitution has its
place, few programmers are aware of it and its implications. In
particular, if external input is used for the format string, an
attacker can embed a "%n"
substitution into the format string to overwrite portions of the
stack. The real problem occurs when all of the arguments in the
variable argument list have been exhausted. Because arguments are
passed on the stack in C, the formatting function will write into the
stack.
To combat malicious uses of "%n",
Immunix has produced a set of
patches for glibc 2.2 (the standard C runtime
library for Linux) known as FormatGuard. The
patches take advantage of a GCC compiler extension that allows the
preprocessor to distinguish between macros having the same name, but
different numbers of arguments. FormatGuard
essentially consists of a large set of macros for the
syslog( ), printf( ),
fprintf( ), sprintf( ), and
snprintf( ) functions; the macros call safe
versions of the respective functions. The safe functions count the
number of substitutions in the format string, and ensure that the
proper number of arguments has been supplied.
- Do not use a string from an external source directly as the format specification.
-
Strings obtained from an external source may contain unexpected
percent symbols in them, causing the formatting function to attempt
to substitute arguments that do not exist. If you need simply to
output the string str (to
stdout using printf( ), for
example), do the following:
printf("%s", str);
Following this rule to the letter is not always desirable. In
particular, your program may need to obtain format strings from a
data file as a consequence of internationalization requirements. The
format strings will vary to some extent depending on the language in
use, but they should always have identical substitutions.
- When using vsprintf( ) or sprintf( ) to output to a string, be very careful of using the "%s" substitution without specifying a precision.
-
The vsprintf( ) and sprintf( )
functions both assume an infinite amount of space is available in the
buffer into which they write their output. It is especially common to
use these functions with a statically allocated output buffer. If a
string substitution is made without specifying the precision, and
that string comes from an external source, there is a good chance
that an attacker may attempt to overflow the static buffer by forcing
a string that is too long to be written into the output buffer. (See
Recipe 3.3 for a discussion of buffer overflows.)
One solution is to check the length of the string to be substituted
into the output before using it with vsprintf( )
or sprintf( ). Unfortunately, this solution is
error-prone, especially later in your program's life
when another programmer has to make a change to the size of the
buffer or the format string, necessitating a change to the check.
A better solution is to use a precision modifier in the format
string. For example, if no more than 12 characters from a string
should ever be substituted into the output, use
"%.12s" instead of simply
"%s". The advantage to this
solution is that it is part of the formatting function call; thus, it
is less likely to be overlooked in the event of a later change to the
format string.
- Avoid using vsprintf( ) and sprintf( ). Use vsnprintf( ) and snprintf( ) or vasprintf( ) and asprintf( ) instead. Alternatively, use a secure string library such as SafeStr (see Recipe 3.4).
-
The functions vsprintf( ) and sprintf(
) assume that the buffer into which they write their output
is large enough to hold it all. This is never a safe assumption to
make and frequently leads to buffer overflow vulnerabilities. (See
Recipe 3.3.)
The functions vasprintf( ) and asprintf(
) dynamically allocate a buffer to hold the formatted
output that is exactly the required size. There are two problems with
these functions, however. The first is that they're
not portable. Most modern BSD derivatives (Darwin, FreeBSD, NetBSD,
and OpenBSD) have them, as does Linux. Unfortunately, older Unix
systems and Windows do not. The other problem is that
they're slower because they need to make two passes
over the format string, one to calculate the required buffer size,
and the other to actually produce output in the allocated buffer.
The functions vsnprintf( ) and snprintf(
) are just as fast as vsprintf( ) and
sprintf( ), but like vasprintf(
) and asprintf( ), they are not yet
portable. They are defined in the C99 standard for C, and they
typically enjoy the same availability as vasprintf(
) and asprintf( ). They both require an
additional argument that specifies the length of the output buffer,
and they will never write more data into the buffer than will fit,
including the NULL terminating character.
3.2.4 See Also
|