[ Team LiB ] |
Chapter 8. I/O, Logging, and Console Output
I/O to the disk or the network is hundreds to thousands of times slower than I/O to computer memory. Disk and network transfers are expensive activities and are two of the most likely candidates for performance problems. Two standard optimization techniques for reducing I/O overhead are buffering and caching. For a given amount of data, I/O mechanisms work more efficiently if the data is transferred using a few large chunks of data, rather than many small chunks. Buffering groups of data into larger chunks improves the efficiency of the I/O by reducing the number of I/O operations that need to be executed. Where some objects or data are accessed repeatedly, caching those objects or data can replace an I/O call with a hugely faster memory access (or replace a slow network I/O call with faster local disk I/O). For every I/O call that is avoided because an item is accessed from a cache, you save a large chunk of time equivalent to executing hundreds or thousands of simple operations.[1]
There are some other general points about I/O at the system level that are worth knowing. First, I/O buffers throughout the system typically use a read-ahead algorithm for optimization. This normally means that the next few chunks are read from disk into a low-level buffer somewhere. Consequently, reading sequentially forward through a file is usually faster than other orders, such as reading back to front through a file or random access of file elements. The next point is that at the system level, most operating systems support mmap( ) , memcntl( ), and various shared-memory options. Using these can improve I/O performance dramatically, but they also increase complexity. Portability is also compromised, though not as much as you might think. If you need to use these sorts of features and also maintain portability, you may want to start with the latest Perl distribution. Perl has been ported to a large number of systems, and these features are mapped consistently to system-level features in all ports. Since the Perl source is available, it is possible to extract the relevant system-independent mappings for portability purposes. In the same vein, when simultaneously using multiple open filehandles to I/O devices (sockets, files, pipes, etc.), Java editions prior to the 1.4 release require you to use either polling across the handles, which is system-intensive; a separate thread per handle, which is also system-intensive; or a combination of these two, which in any case is bad for performance. However, almost all operating systems support an efficient multiplexing function call, often called select( ) or sometimes poll( ). This function provides a way to ask the system in one request if any of the (set of) open handles are ready for reading or writing. SDK 1.4 introduced support for the select( )/poll( ) function under the java.nio package, which I discuss further in the NIO section later in this chapter. For versions prior to 1.4, you could again use Perl, which provides a standardized mapping for this function if you need hints on maintaining portability. For efficient complex I/O performance, select( )/poll( ) functionality was probably the largest single missing piece of functionality in Java. Here are some other general techniques to improve I/O performance:
|