[ Team LiB ] Previous Section Next Section

11.5 Comparing LinkedLists and ArrayLists

I'm frequently asked about how LinkedLists compare in performance to ArrayLists. To fully consider the performance ramifications of these two classes, we need to know how they are implemented. So I'll start with brief descriptions of the most important aspects of their implementations from the point of view of performance.

11.5.1 The Vector and ArrayList Implementations

Vector and ArrayList are both implemented with an underlying Object[ ] array that stores the elements. We access the elements in the internal array by index:

public Object get(int index) {
  //check the index is valid first .. code not shown here
  return elementData[index];
}

The internal array can be bigger than the number of elements held by the Vector/ArrayList object: the difference is kept as extra capacity for efficiently adding further elements. Adding elements is then very simply achieved by assigning the element to the first empty location in the internal array and incrementing the index (size) for the new empty location.

public boolean add(Object o) {
  ensureCapacity(size + 1); //explained soon
  elementData[size++] = o;
  return true;  //List.add(Object) signature support
}

Inserting elements into the collection at any location other than the end is slightly more tricky. The array elements above the insertion point must all be moved up by one, then the assignment can occur:

public void add(int index, Object element) {
  //check the index is valid first .. code not shown here
  ensureCapacity(size+1); //explained soon
  System.arraycopy(elementData, index, elementData, index + 1, 
        size - index);
  elementData[index] = element;
  size++;
}

When the spare capacity is used up, the Vector/ArrayList object must replace its internal Object[ ] array with a new larger array when more elements need to be added, copying all the elements to the new array. The new array is 50% to 100% bigger than the old one, depending on the SDK version (the code shown here makes it 100% bigger):

public void ensureCapacity(int minCapacity) {
  int oldCapacity = elementData.length;
  if (minCapacity > oldCapacity) {
    Object oldData[  ] = elementData;
    int newCapacity = Math.max(oldCapacity * 2, minCapacity);
    elementData = new Object[newCapacity];
    System.arraycopy(oldData, 0, elementData, 0, size);
  }
}

The main difference between the Vector class and the ArrayList class is the use of synchronization. Apart from two methods used only during serialization, none of the ArrayList methods are synchronized; in contrast, most of the Vector methods are synchronized directly or indirectly. Consequently, Vector is thread-safe, while ArrayList is not. This makes ArrayList faster than Vector, though for the latest VMs the difference in speed between the two classes is small.

The Vector and ArrayList implementations have excellent performance for indexed access and update of elements because there is no overhead beyond range checking. Adding elements to or deleting elements from the end of the list also gives excellent performance, except when the capacity is exhausted and the internal array has to be expanded. Inserting and deleting elements always requires an array copy (two copies when the internal array must be grown first). The number of elements to be copied is proportional to [size - index], i.e., to the distance between the insertion/deletion index and the last index in the collection. For insertions, inserting at the front of the collection (index 0) yields the worst performance, and inserting at the end of the collection (after the last element) yields the best performance. The array-copying overhead grows significantly as the size of the collection increases, because the number of elements that need to be copied with each insertion increases.

11.5.2 The LinkedList Implementation

LinkedList is implemented using a list of doubly linked nodes. To access elements by index, you need to traverse all the nodes until the indexed node is reached:

public Object get(int index) {
  //check the index is valid first .. code not shown here
  Entry e = header;  //starting node
  //go forwards or backwards depending on which is closer
  if (index < size/2) {
    for (int i = 0; i <= index; i++)
      e = e.next;
  } else {
    for (int i = size; i > index; i--)
      e = e.previous;
  }
  return e;
}

Inserting elements into the list is straightforward: traverse to the node at the index and insert a node immediately before that node:

public void add(int index, Object element) {
  //check the index is valid first .. code not shown here
  Entry e = header;  //starting node
  //go forwards or backwards depending on which is closer
  if (index < size/2) {
    for (int i = 0; i <= index; i++)
      e = e.next;
  } else {
    for (int i = size; i > index; i--)
      e = e.previous;
  }
   
  Entry newEntry = new Entry(element, e, e.previous);
  newEntry.previous.next = newEntry;
  newEntry.next.previous = newEntry;
  size++;
}

The LinkedList implementation has a performance overhead for indexed access and update of elements, as access to any index requires you to traverse multiple nodes. Adding elements to the collection suffers from the index traversal access performance drawback and has a further overhead in requiring the creation of a node object. On the plus side, there is no further overhead to insertions and deletions, so insertion/deletion overhead is really mostly dependent on how far away the insertion/deletion index is from the ends of the collection.

11.5.3 Performance Tests

There are many different functions of the classes that could be tested. LinkedLists are frequently used because of their supposedly better performance for random index insertion and deletion, so I decided to focus on insertion performance, i.e., building collections. I've tested LinkedList against ArrayList since both are unsynchronized.

The insertion speed is critically dependent on the size of the collection and the position where the element is to be inserted. All the best- and worst-case performances arise when inserting either at one of the ends or at the exact middle point of the collection. Consequently, I've chosen three insertion locations (start, middle, and end of the collection) and three representative collection sizes of medium (100 elements), large (10,000 elements), and very large (1,000,000 elements).

Table 11-3 shows the results for a medium collection.

Table 11-3. Building a medium-sized collection (100 elements)

Insertion point

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

Start of the ArrayList

100%

115%

96.4%

129%

104%

899%

Start of the LinkedList

100%

82.6%

64.2%

84%

58.7%

548%

Midpoint of the ArrayList

69.2%

101%

67%

114.4%

80.1%

934%

Midpoint of the LinkedList

131%

105%

80.8%

116%

72.2%

872%

End of the ArrayList

45.3%

43.5%

40.8%

47.9%

47.9%

312%

End of the LinkedList

96.2%

65.6%

61.8%

71.2%

60.7%

334%

Table 11-3 shows that for short collections, ArrayList and LinkedList are performance rivals. ArrayLists have the edge when inserting at the end of the collection (appending elements). But then appending elements is the operation that ArrayList is optimized for: if you just want a statically sized collection, a Java array (e.g., Object[ ]) gives better performance than any collection object. Beyond the append operation, measured timings are mixed and reflect various VM optimization capabilities more than anything else.

What I have not measured here are other advantages that ArrayList has over LinkedList, namely the ability to presize collections and the reduced garbage-collection overhead. Specifically, ArrayLists can be created with a particular size (e.g., in this test the ArrayList could be created with a capacity of 100 elements), thus avoiding all the growth overhead. When running this same test with presized ArrayLists, the ArrayLists times are roughly twice as fast as those recorded in the table! LinkedLists (up to SDK 1.4) cannot be presized.

Additionally, the ArrayList generates only a few extra objects for garbage collection, i.e., the internal array object that holds the elements and one extra internal array object each time the ArrayList capacity is exhausted and the ArrayList needs to be grown. The LinkedList generates one node object for every insertion, irrespective of any deletions that might take place. Consequently, LinkedLists can give considerably more work to the garbage collector (many more objects to collect). Taking these added factors into account, my inclination would be to use an ArrayList rather than a LinkedList for any small- to medium-sized collection.

Table 11-4 shows the results for a large collection.

Table 11-4. Building a large collection (10,000 elements)

Insertion point

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

Start of the ArrayList

6768%

7052%

6612%

7101%

7015%

8125%

Start of the LinkedList

100%

94.2%

52.9%

71.6%

41.2%

682%

Midpoint of the ArrayList

2485%

2923%

2173%

2962%

2882%

3836%

Midpoint of the LinkedList

24152%

14168%

14168%

14064%

14203%

42360%

End of the ArrayList

62.3%

32.4%

34.0%

34.2%

28.5%

254%

End of the LinkedList

84.1%

68.5%

51.4%

56.8%

35.8%

471%

We can see from Table 11-4 that we begin to get a severe performance penalty when we encounter large insertion overhead. The worst case for LinkedList is, as predicted, inserting in the middle of the collection. We can also see that this has worse performance than the ArrayList worst case of insertion at the start of the collection. Insertion at the middle of the ArrayList has significantly better performance than those two worst cases.

Overall, ArrayList again gives better performance for most cases, including index insertion to random locations. If you always need to insert toward the beginning of the collection, LinkedList is a good choice, but you can achieve even better performance using a reversed ArrayList, i.e., either with a dedicated implementation or by flipping indexes using a [size-index] mapping.

The results for very large collections (not shown), indicates very similar conclusions to those of Table 11-4. However, times are so long that they emphasize that very large collections need particularly close matches between data, collection types, and data-manipulation algorithms. Otherwise, you can end up with performance that is essentially unusable. For optimum performance, you should build a dedicated collection class implementation specific to the problem. This is often a necessary step for very large collections.

11.5.4 Querying Performance

Querying is most efficiently achieved by implementing the query inside the class (see Section 11.4). The time needed to iterate over all the elements is the limiting factor for queries on these lists. A query implemented in the ArrayList/Vector classes iterates over the array elements. The following example counts the number of null elements:

int count = 0;
for (int i = 0; i < size; i++)
  if(elementData[i] =  = null)
    count++;

A query implemented in the LinkedList class traverses all the nodes. The following example counts the number of null elements:

node = header.next;
count = 0;
for (int i = 0; i < repeat; i++, node = node.next)
  if (node.element =  = null)
    count++;

Table 11-5 shows the ArrayList providing significantly superior performance to that of the LinkedList, once again indicating that ArrayList is the class to use. Table 11-6 shows the time taken to iterate over all the elements using a ListIterator object obtained from the List.listIterator(int) method. These iterators would be necessary if the query could not be implemented in the List class. Once again, ArrayList shows superior performance, though not as dramatically as with Table 11-5. Note that the absolute times in Table 11-6 are about ten times longer than those in Table 11-5; ArrayList internal traversal is about ten times faster than ArrayList iteration using a ListIterator.

Table 11-5. Iterating through all the elements of the collection using internal access
 

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

ArrayList internal traversal

100%

121%

85.9%

159%

86%

962%

LinkedList internal traversal

533%

557%

1576%

565%

436%

1700%

Table 11-6. Iterating through all the elements of the collection using a ListIterator
 

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

ArrayList iteration using ListIterator

100%

119%

63%

101%

72.2%

2021%

LinkedList iteration using ListIterator

112%

189%

159%

167%

144%

1656%

The measurements and the other factors we've considered clearly indicate that ArrayLists and Vectors usually provide better performance than LinkedLists and synchronized wrapped LinkedLists. Even in cases where you might have thought that the LinkedList would provide better performance, you may be able to coax superior performance from ArrayList by altering how elements are added, for example by reversing the collection order.

There are situations where LinkedLists provide better performance, for example with very large collections where many elements need to be added to both the beginning and end of the collection. But in general, I recommend using ArrayList/Vector as the default and using LinkedList only where there is an identified performance problem that a LinkedList improves.

    Previous Section Next Section