[ Team LiB ] Previous Section Next Section

9.8 Dictionaries

A dictionary is a collection that associates a key to a value. A language dictionary, such as Webster's, associates a word (the key) with its definition (the value).

To see the value of dictionaries, start by imagining that you want to keep a list of the state capitals. One approach might be to put them in an array:

string[] stateCapitals = new string[50];

The stateCapitals array will hold 50 state capitals. Each capital is accessed as an offset into the array. For example, to access the capital for Arkansas, you need to know that Arkansas is the fourth state in alphabetical order:

string capitalOfArkansas = stateCapitals[3];

It is inconvenient, however, to access state capitals using array notation. After all, if I need the capital for Massachusetts, there is no easy way for me to determine that Massachusetts is the 21st state alphabetically.

It would be far more convenient to store the capital with the state name. A dictionary allows you to store a value (in this case, the capital) with a key (in this case, the name of the state).

A .NET Framework dictionary can associate any kind of key (string, integer, object, etc.) with any kind of value (string, integer, object, etc.). Typically, of course, the key is fairly short, the value fairly complex.

The most important attributes of a good dictionary are that it is easy to add values and it is quick to retrieve values. Some dictionaries are faster at adding new values, and others are optimized for retrieval. One example of a dictionary type is the hashtable.

9.8.1 Hashtables

A hashtable is a dictionary optimized for fast retrieval. The principal methods and properties of Hashtable are summarized in Table 9-6.

Table 9-6. Hashtable methods and properties

Method or property

Purpose

Synchronized( ) 

Public static method that returns a Hashtable wrapper that is thread-safe (see Chapter 20).

Count 

Public property that gets the number of elements in the Hashtable.

IsReadOnly 

Public property that gets a value indicating if the Hashtable is read-only.

IsSynchronized 

Public property that gets a value indicating if the Hashtable is synchronized.

Item( ) 

The indexer for the Hashtable.

Keys 

Public property that gets an ICollection containing the keys in the Hashtable. (See also Values, later in this table.)

SyncRoot 

Public property that returns an object that can be used to synchronize access to the Hashtable.

Values 

Public property that gets an ICollection containing the values in the Hashtable. (See also Keys, earlier in this table.)

Add( ) 

Adds an entry with a specified Key and Value.

Clear( ) 

Removes all objects from the Hashtable.

Clone( ) 

Creates a shallow copy.

Contains( )
ContainsKey( )

Determines whether the Hashtable has a specified key.

ContainsValue( ) 

Determines whether the Hashtable has a specified value.

CopyTo( ) 

Copies the Hashtable elements to an existing one-dimensional array.

GetEnumerator( ) 

Returns an enumerator for the Hashtable.

GetObjectData( ) 

Implements ISerializable and returns the data needed to serialize the Hashtable.

OnDeserialization( ) 

Implements ISerializable and raises the deserialization event when the deserialization is complete.

Remove( ) 

Removes the entry with the specified Key.

In a Hashtable, each value is stored in a "bucket." The bucket is numbered, much like an offset into an array.

Because the key may not be an integer, it must be possible to translate the key (e.g., "Massachusetts") into a bucket number. Each key must provide a GetHashCode( ) method that will accomplish this magic.

Remember that everything in C# derives from object. The object class provides a virtual method GetHashCode( ), which the derived types are free to inherit as is or to override.

A trivial implementation of a GetHashCode( ) function for a string might simply add up the Unicode values of each character in the string and then use the modulus operator to return a value between 0 and the number of buckets in the Hashtable. It is not necessary to write such a method for the string type, however, as the CLR provides one for you.

When you insert the values (the state capitals) into the Hashtable, the Hashtable calls GetHashCode( ) on each key provided. This method returns an int, which identifies the bucket into which the state capital is placed.

It is possible, of course, for more than one key to return the same bucket number. This is called a collision. There are a number of ways to handle a collision. The most common solution, and the one adopted by the CLR, is simply to have each bucket maintain an ordered list of values.

When you retrieve a value from the Hashtable, you provide a key. Once again the Hashtable calls GetHashCode( ) on the key and uses the returned int to find the appropriate bucket. If there is only one value, it is returned. If there is more than one value, a binary search of the bucket's contents is performed. Because there are few values, this search is typically very fast.

Load Factor

Hash functions that minimize collisions typically do so at the expense of making less efficient use of their storage. Hash function designers trade off minimizing collisions, maximizing efficient memory usage, and algorithmic speed.

Every CLR Hashtable has a load factor that determines the maximum ratio of entries to buckets. Using a smaller load factor will speed up performance but use more memory. The default factor is 1.0, which Microsoft says provides the optimum balance between speed and size, but you can specify the load factor you want when you instantiate the Hashtable.

As entries are added to the Hashtable, the actual load increases until it matches the load factor specified for the table. When the load factor is reached, the Hashtable automatically increases the number of buckets to the smallest prime number larger than twice the current number of buckets.

If your Hashtable had only one bucket, searching for a key would be a simple binary search and you wouldn't need the Hashtable at all. If, however, you have more than one bucket, the procedure is to hash the key, find the associated bucket, and then search that bucket. If there is only one key in that bucket, searching the bucket is very fast.

With lots of keys, you can spend your time working through the hash to find the right bucket or searching through the bucket to find the right key. That is the tradeoff captured by the LoadFactor.

The key in a Hashtable can be a primitive type, or it can be an instance of a user-defined type (an object). Objects used as keys for a Hashtable must implement GetHashCode( ) as well as Equals. In most cases, you can simply use the inherited implementation from Object.

9.8.2 IDictionary

Hash tables are dictionaries because they implement the IDictionary interface. IDictionary provides a public property Item. The Item property retrieves a value with the specified key. In C#, the declaration for the Item property is:

object this[object key]
{get; set;}

The Item property is implemented in C# with the index operator ([]). Thus, you access items in any Dictionary object using the offset syntax, as you would with an array.

Example 9-17 demonstrates adding items to a Hashtable and then retrieving them with the Item property.

Example 9-17. The Item property as offset operators
namespace Programming_CSharp
{
   using System;
   using System.Collections;
  
   public class Tester
   {
      static void Main( )
      {
         // Create and initialize a new Hashtable.
         Hashtable hashTable = new Hashtable( );
         hashTable.Add("000440312", "Jesse Liberty");
         hashTable.Add("000123933", "Stacey Liberty");
         hashTable.Add("000145938", "John Galt");
         hashTable.Add("000773394", "Ayn Rand");

         // access a particular item
         Console.WriteLine("myHashTable[\"000145938\"]: {0}",
            hashTable["000145938"]);
      }
   }
}

Output:
hashTable["000145938"]: John Galt

Example 9-17 begins by instantiating a new Hashtable. We use the simplest constructor accepting the default initial capacity and load factor (see the sidebar, Load Factor), the default hash code provider, and the default comparer.

We then add four key/value pairs. In this example, the social security number is tied to the person's full name. (Note that the social security numbers here are intentionally bogus.)

Once the items are added, we access the third item using its key.

9.8.3 The Keys and Values Collections

Dictionary collections provide two additional properties: Keys and Values. Keys retrieves an ICollection object with all the keys in the Hashtable, as Values retrieves an ICollection object with all the values. Example 9-18 illustrates.

Example 9-18. Keys and Values collections
namespace Programming_CSharp
{
   using System;
   using System.Collections;
  
   public class Tester
   {
      static void Main( )
      {
         // Create and initialize a new Hashtable.
         Hashtable hashTable = new Hashtable( );
         hashTable.Add("000440312", "George Washington");
         hashTable.Add("000123933", "Abraham Lincoln");
         hashTable.Add("000145938", "John Galt");
         hashTable.Add("000773394", "Ayn Rand");

         // get the keys from the hashTable
         ICollection keys = hashTable.Keys;

         // get the values
         ICollection values = hashTable.Values;

         // iterate over the keys ICollection
         foreach(string key in keys)
         {
            Console.WriteLine("{0} ", key);
         }

         // iterate over the values collection
         foreach (string val in values)
         {
            Console.WriteLine("{0} ", val);
         }
      }
   }
}
Output:
000440312
000123933
000773394
000145938
George Washington
Abraham Lincoln
Ayn Rand
John Galt

Although the order of the Keys collection is not guaranteed, it is guaranteed to be the same order as returned in the Values collection.

9.8.4 IDictionaryEnumerator Interface

IDictionary objects also support the foreach construct by implementing the GetEnumerator method, which returns an IDictionaryEnumerator.

The IDictionaryEnumerator is used to enumerate through any IDictionary object. It provides properties to access both the key and value for each item in the dictionary. Example 9-19 illustrates.

Example 9-19. Using the IDictionaryEnumerator interface
namespace Programming_CSharp
{
   using System;
   using System.Collections;
  
   public class Tester
   {

      static void Main( )
      {
         // Create and initialize a new Hashtable.
         Hashtable hashTable = new Hashtable( );
         hashTable.Add("000440312", "George Washington");
         hashTable.Add("000123933", "Abraham Lincoln");
         hashTable.Add("000145938", "John Galt");
         hashTable.Add("000773394", "Ayn Rand");

         // Display the properties and values of the Hashtable.
         Console.WriteLine( "hashTable" );
         Console.WriteLine( "  Count:    {0}", hashTable.Count );
         Console.WriteLine( "  Keys and Values:" );
         PrintKeysAndValues( hashTable );
      }

      public static void PrintKeysAndValues( Hashtable table )  
      {
         IDictionaryEnumerator enumerator = table.GetEnumerator( );
         while ( enumerator.MoveNext( ) )
            Console.WriteLine( "\t{0}:\t{1}", 
               enumerator.Key, enumerator.Value );
         Console.WriteLine( );
       
      }
   }
}

Output:
hashTable
  Count:    4
  Keys and Values:
        000440312:      George Washington
        000123933:      Abraham Lincoln
        000773394:      Ayn Rand
        000145938:      John Galt
    [ Team LiB ] Previous Section Next Section