[ Team LiB ] |
11.2 Java 2 CollectionsThe Java 2 Collections framework provides a set of collection classes. Each class has its own performance strengths and weaknesses, which I cover here. The collection implementations use the synchronized-wrapper framework to provide synchronized classes; otherwise, the implementations are unsynchronized (except for two exceptions noted shortly). Collection classes wrapped in synchronized wrappers are always slower than unwrapped, unsynchronized classes. Nevertheless, my recommendation is generally to use objects within synchronized wrappers. You can selectively "unwrap" objects when they have been identified as part of a bottleneck and when the synchronization is not necessary. (The performance aspects of thread-safe collections are discussed in detail in Chapter 10. Synchronized wrappers are also discussed in that chapter in Section 10.4.1.) Table 11-1 summarizes the performance attributes of the collection classes.
Implementations of Set are slower to update than most other collection objects and should be avoided unless you need Set functionality. Of the three available Set implementations, HashSet is definitely faster than TreeSet, with LinkedHashSet, available from SDK 1.4, somewhere in between. HashSet uses an underlying HashMap, so the way HashSet maintains uniqueness is straightforward. Objects are added to the set as the keys to the HashMap, so there is no need to search the set for the elements. This optimizes unique element addition. If you need Set functionality but not specifically a Set implementation, it is faster to use a HashMap directly. Map has four general-purpose implementations: Hashtable , HashMap, TreeMap, and, added in SDK 1.4, LinkedHashMap. In addition, there are several specialized implementations that provide few performance improvements,[6] except for IdentityHashMap , added in 1.4. IdentityHashMap is based on identity comparisons rather than equality comparisons (equality comparisons form the basis for all general-purpose maps), making it the fastest useful Map. IdentityHashMap has one tuning parameter, the expected maximum size, which can help avoid rehashing by setting the number of buckets in the initial map.
In the case of the general-purpose Maps, TreeMap is significantly slower than the other Maps and should not be used unless you need the extra functionality of iterating ordered keys. LinkedHashMap also provides the ability to iterate its keys in order, with the default order being key-insertion order. LinkedHashMap should normally be faster than TreeMap, but slower than HashMap. Hashtable is a synchronized Map, and HashMap is an unsynchronized Map. Hashtable is present for backward compatibility with earlier versions of the JDK. Nevertheless, if you need to use a synchronized Map, a Hashtable is faster than using a HashMap in a synchronized wrapper. Hashtable, HashMap, and HashSet are all O(1) for access and update, so they should scale nicely if you have the available memory space. LinkedHashMap is based on a hash table but also maintains a linked list of entries so it can use the linked list to iterate through the entries in a particular order: the default order is the insertion order of keys. LinkedHashMap can be configured to order its entries from most-recently-accessed to least-recently-accessed by passing true as the third argument to the constructor. This mode is specifically provided so that the Map can be used as a least-recently-used (LRU) cache. The class provides a method called removeEldestEntry( ) that is intended to be overriden in a subclass to provide a policy for automatically removing stale mappings when new mappings are added to the Map. The default action is to never remove any entries automatically (removeEldestEntry( ) always returns false). An example implementation for an LRU cache of 100 elements would simply subclass the LinkedHashMap and implement the removeEldestEntry( ) as return size( ) > 100, which would automatically remove entries whenever the collection exceeded 100 elements. List has four general-purpose implementations: Vector , Stack, ArrayList, and LinkedList. Vector, Stack, and ArrayList have underlying implementations based on arrays. LinkedList has an underlying implementation consisting of a doubly linked list. As such, LinkedList's performance is worse than any of the other three Lists for most operations. For very large collections that you cannot presize to be large enough, LinkedList provides better performance when adding or deleting elements toward the middle of the list, if the array-copying overhead of the other Lists is higher than the linear access time of the LinkedList. Otherwise, LinkedList's only likely performance advantage is as a first-in-first-out queue or double-ended queue. (A circular array-list implementation provides better performance for a FIFO queue.) I discuss the performance differences between LinkedLists and ArrayLists in much more detail later in this chapter. Vector is a synchronized List, and ArrayList is an unsynchronized List. Vector is present for backward compatibility with earlier versions of the JDK. Nevertheless, if you need to use a synchronized List, a Vector is faster than using an ArrayList in a synchronized wrapper. (See the comparison test at the end of Section 10.4.1 in Chapter 10.) Stack is a subclass of Vector with the same performance characteristics, but with additional functionality as a last-in-first-out queue. |