11.2 Java 2 Collections

The Java 2 Collections framework provides a set of collection classes. Each class has its own performance strengths and weaknesses, which I cover here. The collection implementations use the synchronized-wrapper framework to provide synchronized classes; otherwise, the implementations are unsynchronized (except for two exceptions noted shortly). Collection classes wrapped in synchronized wrappers are always slower than unwrapped, unsynchronized classes. Nevertheless, my recommendation is generally to use objects within synchronized wrappers. You can selectively "unwrap" objects when they have been identified as part of a bottleneck and when the synchronization is not necessary. (The performance aspects of thread-safe collections are discussed in detail in Chapter 10. Synchronized wrappers are also discussed in that chapter in Section 10.4.1.)

Table 11-1 summarizes the performance attributes of the collection classes.

Table 11-1. Performance attributes of Java 2 collection classes

Interface

Class

Synchronized?

Performance attributes

Set

HashSet

No

Fastest Set; slower than HashMap but implements the Set interface (HashMap does not).

LinkedHashSet

No

Available from 1.4. Based on a LinkedHashMap, so provides iteration of elements according to insertion order. Faster than a TreeSet.

TreeSet

No

Slower than HashSet; provides iteration of keys in order.

Map

IdentityHashMap

No

Available from 1.4. Special-purpose HashMap based on identity (= =) instead of equality (.equals). Faster than HashMap for high-performance mapping where the identity semantics are acceptable.

HashMap

No

Fastest general Map.

Hashtable

Yes

Slower than HashMap, but faster than synchronized HashMap.

LinkedHashMap

No

Available from 1.4. A hash map implementation that also maintains an ordered linked list of entries. Provides iteration of keys in entry order. Faster than TreeMap but slower than other Maps.

TreeMap

No

Slower than Hashtable and HashMap; provides iteration of keys in order.

List

ArrayList

No

Fastest List.

Vector

Yes

Slower than ArrayList, but faster than synchronized ArrayList.

Stack

Yes

Same speed as Vector; provides LIFO queue functionality.

LinkedList

No

Slower than other Lists, but may be faster for some types of queues.

Implementations of Set are slower to update than most other collection objects and should be avoided unless you need Set functionality. Of the three available Set implementations, HashSet is definitely faster than TreeSet, with LinkedHashSet, available from SDK 1.4, somewhere in between. HashSet uses an underlying HashMap, so the way HashSet maintains uniqueness is straightforward. Objects are added to the set as the keys to the HashMap, so there is no need to search the set for the elements. This optimizes unique element addition. If you need Set functionality but not specifically a Set implementation, it is faster to use a HashMap directly.

Map has four general-purpose implementations: Hashtable , HashMap, TreeMap, and, added in SDK 1.4, LinkedHashMap. In addition, there are several specialized implementations that provide few performance improvements,^[6] except for IdentityHashMap , added in 1.4. IdentityHashMap is based on identity comparisons rather than equality comparisons (equality comparisons form the basis for all general-purpose maps), making it the fastest useful Map. IdentityHashMap has one tuning parameter, the expected maximum size, which can help avoid rehashing by setting the number of buckets in the initial map.

^[6] Attributes simply wraps a HashMap, and restricts the keys to be ASCII-character alphanumeric Strings, and values to be Strings. WeakHashMap can maintain a cache of elements that are automatically garbage-collected when memory gets low. RenderingHints is specialized for use within the AWT packages. Properties is a Hashtable subclass specialized for maintaining key-value string pairs in files. UIDefaults is specialized for use within the Swing packages.

In the case of the general-purpose Maps, TreeMap is significantly slower than the other Maps and should not be used unless you need the extra functionality of iterating ordered keys. LinkedHashMap also provides the ability to iterate its keys in order, with the default order being key-insertion order. LinkedHashMap should normally be faster than TreeMap, but slower than HashMap. Hashtable is a synchronized Map, and HashMap is an unsynchronized Map. Hashtable is present for backward compatibility with earlier versions of the JDK. Nevertheless, if you need to use a synchronized Map, a Hashtable is faster than using a HashMap in a synchronized wrapper.

Hashtable, HashMap, and HashSet are all O(1) for access and update, so they should scale nicely if you have the available memory space. LinkedHashMap is based on a hash table but also maintains a linked list of entries so it can use the linked list to iterate through the entries in a particular order: the default order is the insertion order of keys. LinkedHashMap can be configured to order its entries from most-recently-accessed to least-recently-accessed by passing true as the third argument to the constructor. This mode is specifically provided so that the Map can be used as a least-recently-used (LRU) cache. The class provides a method called removeEldestEntry( ) that is intended to be overriden in a subclass to provide a policy for automatically removing stale mappings when new mappings are added to the Map. The default action is to never remove any entries automatically (removeEldestEntry( ) always returns false). An example implementation for an LRU cache of 100 elements would simply subclass the LinkedHashMap and implement the removeEldestEntry( ) as return size( ) > 100, which would automatically remove entries whenever the collection exceeded 100 elements.

List has four general-purpose implementations: Vector , Stack, ArrayList, and LinkedList. Vector, Stack, and ArrayList have underlying implementations based on arrays. LinkedList has an underlying implementation consisting of a doubly linked list. As such, LinkedList's performance is worse than any of the other three Lists for most operations. For very large collections that you cannot presize to be large enough, LinkedList provides better performance when adding or deleting elements toward the middle of the list, if the array-copying overhead of the other Lists is higher than the linear access time of the LinkedList. Otherwise, LinkedList's only likely performance advantage is as a first-in-first-out queue or double-ended queue. (A circular array-list implementation provides better performance for a FIFO queue.) I discuss the performance differences between LinkedLists and ArrayLists in much more detail later in this chapter. Vector is a synchronized List, and ArrayList is an unsynchronized List. Vector is present for backward compatibility with earlier versions of the JDK. Nevertheless, if you need to use a synchronized List, a Vector is faster than using an ArrayList in a synchronized wrapper. (See the comparison test at the end of Section 10.4.1 in Chapter 10.) Stack is a subclass of Vector with the same performance characteristics, but with additional functionality as a last-in-first-out queue.