11.3 Hashtables and HashMaps
Because Hashtables
and HashMaps are the most commonly used nonlist
structures, I will spend a little extra time discussing them.
Hashtables and HashMaps are
pretty fast and provide adequate performance for most purposes. I
rarely find that I have a performance problem using
Hashtables or HashMaps, but
here are some points that will help you tune them or, if necessary,
replace them:
Hashtable is synchronized. That's
fine if you are using it to share data across threads, but if you are
using it single-threaded, you can replace it with an unsynchronized
version to get a small boost in performance.
HashMap is an unsynchronized version available
from JDK 1.2.
Hashtables and HashMaps are
resized whenever the number of elements reaches the [capacity
x loadFactor]. This requires
reassigning every element to a new array using the rehashed values.
This is not simply an array copy; every element needs to have its
internal table position recalculated using the new table size for the
hash function. You are usually better off setting an initial capacity
that handles all the elements you want to add. This initial capacity
should be the number of elements divided by the
loadFactor (the default load factor is 0.75).
Hashtables and HashMaps are
faster with a smaller loadFactor, but take up more
space. You have to decide how this tradeoff works best for you.
The hashing function for most implementations should work better with
a capacity that is a prime number. However, the 1.4
HashMap implementation (but not the
Hashtable implementation) uses a different
implementation that requires a power-of-two capacity so that it can
use bit shifting and masking instead of the % operator. If you
specify a non-power-of-two capacity, the HashMap
will automatically find the nearest power-of-two value higher than
the specified capacity. For other hash maps, always use a prime
(preferably) or odd number capacity. A useful prime number to
remember is 89. The sequence of numbers generated by successively
multiplying by two and adding one includes several primes when the
sequence starts with 89. But note also that speedups from prime
number capacities are small at best.
Access to
the Map requires asking the key for its
hashCode( ) and also testing that the key
equals( ) the key you are retrieving. You can
create a specialized Map class that bypasses these
calls if appropriate. Alternatively, you can use specialized key
classes that have very fast method calls for these two methods. Note,
for example, that Java String objects have
hashCode( ) methods that iterate and execute
arithmetic over a number of characters to determine the value, and
the String.equals( ) method checks that every
character is identical for the two strings being compared.
Considering that strings are used as the most common keys in
Hashtables, I'm often surprised
to find that I don't have a
performance problem with them, even for largish tables. From JDK 1.3,
Strings cache their hash code in an instance
variable, making them faster and more suited as
Map keys.
If you are building a specialized Hashtable, you
can map objects to array elements to preallocate
HashtableEntry objects and speed up access as
well. The technique is illustrated in the "Search
Trees" section later in this chapter.
The hash function maps the entries to table elements. The fewer
entries that map to the same internal table entry, the more efficient
the map. There are techniques for creating more efficient hash maps,
for instance, those discussed in my article
"Optimizing Hash Functions For a Perfect
Map" (see http://www.onjava.com/pub/a/onjava/2001/01/25/hash_functions.html).
Here is a specialized class to use
for keys in a Hashtable. This example assumes that
I am using String keys, but all my
String objects are nonequal, and I can reference
keys by identity. I use a utility class,
tuning.dict.Dict, which holds a large array of
nonequal words taken from an English dictionary. I compare the access
times against all the keys using two different
Hashtables, one using the plain
String objects as keys, the other using my own
StringWrapper objects as keys. The
StringWrapper objects cache the hash value of the
string and assume that equality comparison is the same as identity
comparison. These are the fastest possible equals(
) and hashCode( ) methods. The access
speedups are illustrated in the following table of measurements
(times normalized to the JDK 1.2 case):
String keys
|
112%
|
100%
|
40.8%
|
65.4%
|
41.2%
|
60.1%
|
181%
|
String-wrapped keys
|
87.4%
|
71.1%
|
40.3%
|
57.6%
|
38.4%
|
58.6%
|
159%
|
If you create a hash-table implementation specialized for the
StringWrapper class, you avoid calling the
hashCode( ) and equals( )
methods completely. Instead, the specialized hash table can access
the hash-instance variable directly and use identity comparison of
the elements. The speedup is considerably larger, and for specialized
purposes, this is the route to follow:
package tuning.hash;
import java.util.Hashtable;
import tuning.dict.Dict;
public class SpecialKeyClass
{
public static void main(String[ ] args)
{
//Initialize the dictionary
try{Dict.initialize(true);}catch(Exception e){ }
System.out.println("Started Test");
//Build the two hash tables. Keep references to the
//StringWrapper objects for later use as accessors.
Hashtable h1 = new Hashtable( );
Hashtable h2 = new Hashtable( );
StringWrapper[ ] dict = new StringWrapper[Dict.DICT.length];
for (int i = 0; i < Dict.DICT.length; i++)
{
h1.put(Dict.DICT[i], Boolean.TRUE);
h2.put(dict[i] = new StringWrapper(Dict.DICT[i]), Boolean.TRUE);
}
System.out.println("Finished building");
Object o;
//Time the access for normal String keys
long time1 = System.currentTimeMillis( );
for (int i = 0; i < Dict.DICT.length; i++)
o = h1.get(Dict.DICT[i]);
time1 = System.currentTimeMillis( ) - time1;
System.out.println("Time1 = " + time1);
//Time the access for StringWrapper keys
long time2 = System.currentTimeMillis( );
for (int i = 0; i < Dict.DICT.length; i++)
o = h2.get(dict[i]);
time2 = System.currentTimeMillis( ) - time2;
System.out.println("Time2 = " + time2);
}
}
final class StringWrapper
{
//cached hash code
private int hash;
private String string;
public StringWrapper(String str)
{
string = str;
hash = str.hashCode( );
}
public final int hashCode( )
{
return hash;
}
public final boolean equals(Object o)
{
//The fastest possible equality check
return o = = this;
/*
//This would be the more generic equality check if we allowed
//access of the same String value from different StringWrapper objects.
//This is still faster than the plain Strings as keys.
if(o instanceof StringWrapper)
{
StringWrapper s = (StringWrapper) o;
return s.hash = = hash && string.equals(s.string);
}
else
return false;
*/
}
}
|