3.13 Preventing File Descriptor Overflows When Using select( )
3.13.1 Problem
Your program uses the
select( ) system call to determine when sockets
are ready for writing, have data waiting to be read, or have an
exceptional condition (e.g., out-of-band data has arrived). Using
select( ) requires the use of the
fd_set data type, which typically entails the use
of the FD_*( ) family of macros. In most
implementations, FD_SET( ) and FD_CLR(
), in particular, are susceptible to an array overrun.
3.13.2 Solution
Do not use the FD_*( ) family of macros. Instead,
use the macros that are provided in this recipe. The FD_SET(
) and FD_CLR( ) macros will modify an
fd_set object without performing any bounds
checking. The macros we provide will do proper bounds checking.
3.13.3 Discussion
The select( ) system
call is normally used to multiplex sockets. In a single-threaded
environment, select( ) allows you to build sets of
socket descriptors for which you wish to wait for data to become
available or that you wish to have available to write data to. The
fd_set data type is used to hold a list of the
socket descriptors, and several standard macros are used to
manipulate objects of this type.
Normally, fd_set is defined as a structure with a
single member that is a statically allocated array of long integers.
Because socket descriptors are always numbered starting with 0 and
ending with the highest allowable descriptor, the array of integers
in an fd_set is actually treated as a bitmask with
a one-to-one correspondence between bits and socket descriptors.
The size of the array in the fd_set structure is
determined by the FD_SETSIZE macro. Most often,
the size of the array is sufficiently large to be able to handle any
possible file descriptor, but the problem is that most
implementations of the FD_SET( ) and
FD_CLR( ) macros (which are used to set and clear
socket descriptors in an fd_set object) do not
perform any bounds checking and will happily overrun the array if
asked to do so.
If FD_SETSIZE is defined to be sufficiently large,
why is this a problem? Consider the situation in which a server
program is compiled with FD_SETSIZE defined to be
256, which is normally the maximum number of file and socket
descriptors allowed in a Unix process. Everything works just fine for
a while, but eventually the number of allowed file descriptors is
increased to 512 because 256 are no longer enough for all the
connections to the server. The increase in file descriptors could be
done externally by using setrlimit(
)
before starting the server process (with the bash shell, the command
would be ulimit -n 512).
The proper way to deal with this problem is to allocate the array
dynamically and ensure that FD_SET( ) and
FD_CLR( ) resize the array as necessary before
modifying it. Unfortunately, to do this, we need to create a new data
type. We define the data type such that it can be safely cast to an
fd_set for passing it directly to select(
):
#include <stdlib.h>
typedef struct {
long int *fds_bits;
size_t fds_size;
} SPC_FD_SET;
With a new data type defined, we can replace FD_SET(
), FD_CLR( ), FD_ISSET(
), and FD_ZERO( ), which are normally
implemented as preprocessor macros. Instead, we will implement them
as functions because we need to do a little extra work, and it also
helps ensure type safety:
void spc_fd_zero(SPC_FD_SET *fdset) {
fdset->fds_bits = 0;
fdset->fds_size = 0;
}
void spc_fd_set(int fd, SPC_FD_SET *fdset) {
long *tmp_bits;
size_t new_size;
if (fd < 0) return;
if (fd > fdset->fds_size) {
new_size = sizeof(long) * ((fd + sizeof(long) - 1) / sizeof(long));
if (!(tmp_bits = (long *)realloc(fdset->fds_bits, new_size))) return;
fdset->fds_bits = tmp_bits;
fdset->fds_size = new_size;
}
fdset->fds_bits[fd / sizeof(long)] |= (1 << (fd % sizeof(long)));
}
void spc_fd_clr(int fd, SPC_FD_SET *fdset) {
long *tmp_bits;
size_t new_size;
if (fd < 0) return;
if (fd > fdset->fds_size) {
new_size = sizeof(long) * ((fd + sizeof(long) - 1) / sizeof(long));
if (!(tmp_bits = (long *)realloc(fdset->fds_bits, new_size))) return;
fdset->fds_bits = tmp_bits;
fdset->fds_size = new_size;
}
fdset->fds_bits[fd / sizeof(long)] |= (1 << (fd % sizeof(long)));
}
int spc_fd_isset(int fd, SPC_FD_SET *fdset) {
if (fd < 0 || fd >= fdset->fds_size) return 0;
return (fdset->fds_bits[fd / sizeof(long)] & (1 << (fd % sizeof(long))));
}
void spc_fd_free(SPC_FD_SET *fdset) {
if (fdset->fds_bits) free(fdset->fds_bits);
}
int spc_fd_setsize(SPC_FD_SET *fdset) {
return fdset->fds_size;
}
Notice that we've added two additional functions,
spc_fd_free( ) and spc_fd_setsize(
). Because we are now dynamically allocating
the array, there must be some way to free it. The function
spc_fd_free( ) will only free the inner contents
of the SPC_FD_SET object passed to it, leaving
management of the SPC_FD_SET object up to
you—you may allocate these objects either statically or
dynamically. The other function, spc_fd_setsize(
), is a replacement for the FD_SETSIZE
macro that is normally used as the first argument to select(
), indicating the size of the FD_SET
objects passed as the next three arguments.
Finally, using the new code requires some minor changes to existing
code that uses the standard fd_set. Consider the
following code example, where the variable
client_count is a global variable that represents
the number of connected clients, and the variable
client_fds is a global variable that is an array
of socket descriptors for each connected client:
void main_server_loop(int server_fd) {
int i;
fd_set read_mask;
for (;;) {
FD_ZERO(&read_mask);
FD_SET(server_fd, &read_mask);
for (i = 0; i < client_count; i++) FD_SET(client_fds[i], &read_mask);
select(FD_SETSIZE, &read_mask, 0, 0, 0);
if (FD_ISSET(server_fd, &read_mask)) {
/* Do something with the server_fd such as call accept( ) */
}
for (i = 0; i < client_count; i++)
if (FD_ISSET(client_fds[i], &read_mask)) {
/* Read some data from the client's socket descriptor */
}
}
}
}
The equivalent code using the SPC_FD_SET data type
and the functions that operate on it would be:
void main_server_loop(int server_fd) {
int i;
SPC_FD_SET read_mask;
for (;;) {
spc_fd_zero(&read_mask);
spc_fd_set(server_fd, &read_mask);
for (i = 0; i < client_count; i++) spc_fd_set(client_fds[i], &read_mask);
select(spc_fd_size(&read_mask), (fd_set *)&read_mask, 0, 0, 0);
if (spc_fd_isset(server_fd, &read_mask)) {
/* Do something with the server_fd such as call accept( ) */
}
for (i = 0; i < client_count; i++)
if (spc_fd_isset(client_fds[i], &read_mask)) {
/* Read some data from the client's socket descriptor */
}
spc_fd_free(&read_mask);
}
}
As you can see, the code that uses SPC_FD_SET is
not all that different from the code that uses
fd_set. Naming issues aside, the only real
differences are the need to cast the SPC_FD_SET
object to an fd_set object, and to call
spc_fd_free( ).
3.13.4 See Also
Recipe 3.3
|