Chapter 8. I/O, Logging, and Console Output

I/O, I/O, it's off to work we go.
—Ava Shirazi

I/O to the disk or the network is hundreds to thousands of times slower than I/O to computer memory. Disk and network transfers are expensive activities and are two of the most likely candidates for performance problems. Two standard optimization techniques for reducing I/O overhead are buffering and caching.

For a given amount of data, I/O mechanisms work more efficiently if the data is transferred using a few large chunks of data, rather than many small chunks. Buffering groups of data into larger chunks improves the efficiency of the I/O by reducing the number of I/O operations that need to be executed.

Where some objects or data are accessed repeatedly, caching those objects or data can replace an I/O call with a hugely faster memory access (or replace a slow network I/O call with faster local disk I/O). For every I/O call that is avoided because an item is accessed from a cache, you save a large chunk of time equivalent to executing hundreds or thousands of simple operations.^[1]

^[1] Caching usually requires intercepting a simple attempt to access an object and replacing that simple access with a more complex routine that accesses the object from the cache. Caching is easier to implement if the application has been designed with caching in mind from the beginning, by grouping external data access. If the application is not so designed, you may still be lucky, as there are normally only a few points of external access from an application that allow you to add caching easily.

There are some other general points about I/O at the system level that are worth knowing. First, I/O buffers throughout the system typically use a read-ahead algorithm for optimization. This normally means that the next few chunks are read from disk into a low-level buffer somewhere. Consequently, reading sequentially forward through a file is usually faster than other orders, such as reading back to front through a file or random access of file elements.

The next point is that at the system level, most operating systems support mmap( ) , memcntl( ), and various shared-memory options. Using these can improve I/O performance dramatically, but they also increase complexity. Portability is also compromised, though not as much as you might think. If you need to use these sorts of features and also maintain portability, you may want to start with the latest Perl distribution. Perl has been ported to a large number of systems, and these features are mapped consistently to system-level features in all ports. Since the Perl source is available, it is possible to extract the relevant system-independent mappings for portability purposes.

In the same vein, when simultaneously using multiple open filehandles to I/O devices (sockets, files, pipes, etc.), Java editions prior to the 1.4 release require you to use either polling across the handles, which is system-intensive; a separate thread per handle, which is also system-intensive; or a combination of these two, which in any case is bad for performance. However, almost all operating systems support an efficient multiplexing function call, often called select( ) or sometimes poll( ). This function provides a way to ask the system in one request if any of the (set of) open handles are ready for reading or writing. SDK 1.4 introduced support for the select( )/poll( ) function under the java.nio package, which I discuss further in the NIO section later in this chapter. For versions prior to 1.4, you could again use Perl, which provides a standardized mapping for this function if you need hints on maintaining portability. For efficient complex I/O performance, select( )/poll( ) functionality was probably the largest single missing piece of functionality in Java.

SDKs prior to 1.4 do provide nonblocking I/O by means of polling. Polling means that every time you want to read or write, you first test whether there are bytes to read or space to write. If you cannot read or write, you go into a loop, repeatedly testing until you can perform the desired read/write operation. Polling of this sort is extremely system-intensive, especially because in order to obtain good performance, you must normally put I/O into the highest-priority thread. Polling solutions are usually more system-intensive than multithreaded I/O and do not perform as well. Multiplexed I/O, as obtained with the select( ) system call, provides far superior performance to both. Polling does not scale. If you are building a server, you are well advised to add support for the select( ) system call.

Here are some other general techniques to improve I/O performance:

Execute I/O in the background. Decoupling the application processes from the I/O operations means that, ideally, your application does not spend time waiting for I/O. In practice, it can be difficult to completely decouple the I/O, but usually some reads can be anticipated and some writes can be run asynchronously without the program requiring immediate confirmation of success.
Avoid executing I/O in loops. Try to replace multiple smaller I/O calls with a few larger I/O calls. Because I/O is a slow operation, executing in a loop means that the loop is normally bottlenecked on the I/O call.
When actions need to be performed while executing I/O, try to separate the I/O from those actions to minimize the number of I/O operations that need to be executed. For example, if a file needs to be parsed, instead of reading a bit, parsing a bit, and repeating until finished, it can be quicker to read in the whole file and then parse the data in memory.
If you repeatedly access different locations within the same set of files, you can optimize performance by keeping the files open and navigating around them instead of repeatedly opening and closing the files. This often requires using random-access classes (e.g., RandomAccessFile) rather than the easier sequential-access classes (e.g., FileReader).
Preallocate files to avoid the operating-system overhead that comes from allocating files. This can be done by creating files of the expected size, filled with any character (0 is conventional). The bytes can then be overwritten (e.g., with the RandomAccessFile class).
Using multiple files simultaneously can improve performance because of disk parallelism and CPU availability during disk reads and writes. However, this technique needs to be balanced against the cost of extra opens and closes and the extra resources required by multiple open streams. Sequentially opening and closing multiple files is usually bad for performance (e.g., when loading unpacked class files from the filesystem into the Java runtime).