[ Team LiB ] |
11.10 Search TreesHere's an alternate solution to the problem presented in the last section. I looked for a more obvious solution, another tree structure that would handle the search, provide full keyed access, and give plenty of scope for tuning. Jon Bentley and Bob Sedgewick[9] detail a potential solution that offers an interesting structure and provides a good tuning exercise, so I will use it here.
In a ternary tree, each node has three branches. The structure is a halfway point between binary trees of strings (one string per node) and digital tries. A digital trie stores strings character by character, and has an n-way branching where n is the number of possible characters in the string (e.g., 26 if all strings have only lowercase alphabetic characters, 256 if strings can contain any 8-byte character, 34,000 if each node can be any Unicode character). Digital tries are lightning-fast to search, but have exorbitant storage costs that typically rule them out as a solution. The ternary tree node searches by comparing the current character with the current node's character. If equal, the next character in the string becomes the current character, and the node at the "equal" pointer becomes the current node. Otherwise, the current character in the string remains the current character, and the node at the "higher" or "lower" pointer becomes the current node. A TernarySearchTreeNode class has the Java class structure given as follows (the extra "value" instance variable is to allow any object to be stored as the value for a particular key): class TernarySearchTreeNode { char splitchar; TernarySearchTreeNode low; TernarySearchTreeNode high; TernarySearchTreeNode equal; Object value; } Bentley and Sedgewick provide code (in C, but easily ported) to search and insert into the tree. The recursive versions are: public static Object search(TernarySearchTreeNode p, String str, int strIdx) { //Start from a node char c; //if the node is null, return null. //This means there was no match to the string. if (p = = null) return null; //otherwise if the current character is less than //the splitchar value, replace the current node with //the low node, and carry on searching at this character else if ( (c=str.charAt(strIdx)) < p.splitchar) return search(p.low, str, strIdx); //or if the current character is larger than the //splitchar value, replace the current node with the //high node, and carry on searching at this character else if (c > p.splitchar) return search(p.high, str, strIdx); else { //otherwise, we match the current string character with //the character at this node. If we have finished the //string, then this is the searched for node, and //we can return the value stored at this node. if (strIdx = = (str.length( )-1)) return p.value; else //or this is not the end of the string, so replace //the current node with the equal node, and carry on //searching at the next string character return search(p.equal, str, strIdx+1); } } public static TernarySearchTreeNode insert(TernarySearchTreeNode p, String str, int strIdx, Object o) { //Start from a node. If there is no node, then we create a new node //to insert into the tree. This could even be the root node. char c; if (p = = null) { p = new TernarySearchTreeNode(str.charAt(strIdx)); } //Now navigate the tree just as for the search method, inserting //nodes as required. For each recursive insert( ) call, the //returned node is the one we assign to the current nodes low, //high or equal node, depending on the comparison of the //current string character and the current node character. if ( (c = str.charAt(strIdx)) < p.splitchar) p.low = insert(p.low, str, strIdx, o); else if (c = = p.splitchar) { //When we finally get to the last node (matched or inserted, //doesn't matter), we insert the value, given by Object o if (strIdx = = (str.length( )-1)) p.value = o; else p.equal = insert(p.equal, str, strIdx+1, o); } else p.high = insert(p.high , str, strIdx, o); return p; } //Simple constructor, just assigns the character. public TernarySearchTreeNode(char c) { splitchar = c; } A class to use these methods, with get( ) and put( ) methods such as Map, looks like this: public class TernarySearchTree { TernarySearchTreeNode root; public Object get(String key) { return TernarySearchTreeNode.search(root, key, 0); } public Object put(String key, Object value) { //Note there is no need to initialize root. The recursive insert( ) //call creates a root object the first time through. root = TernarySearchTreeNode.insert(root, key, 0, value); return null; //fake old value for now } } This is fairly straightforward. (Note that the Map.put( ) should return the old value, if any, at the key being set, but we have not implemented that functionality just yet.) The accessor and updator just follow the described algorithm, comparing the current character to the character at the current node and taking the appropriate next branch unless you have reached the end of the string. In that case, the value is returned or updated according to whether this is a search or insert. If you compare update and access times against HashMap , you'll find that the TernarySearchTree is much slower. We are expecting this slowdown, because the referred article does the same comparison and indicates that many optimizations are necessary to achieve similar times to the HashMap. Since they do achieve similar times after optimizing, assume that you can too, and run through a tuning phase to see what improvements you can make. The target is always the HashMap times, since you already have TreeMap if you need the partial-matching functionality without HashMap access speed. If TreeMap does not exist, or you tune another structure with no counterpart, you still need a goal for the performance. This goal should be based on the application requirements. In the absence of application requirements, you should aim for the performance of some other existing class that provides similar or partial functionality. For this example, it is still sensible to use HashMap to provide the performance target because the full key-match access time for the structure will probably be compared to HashMap access times by most developers. The baseline is a large dictionary of words. Knowing that tree access and update are susceptible to the order of keys added, you are testing both for randomized order of insertion and mostly-sorted order, so that you know the near worst case and (probable) average case. Take a HashMap that is presized (i.e., large enough to avoid rehashing after addition of all keys), using the case where the keys are mostly sorted as a baseline. Assign the time taken to build the collection as 100%, and also assign the time taken to access every key in it as 100% (i.e., each of the access and update values, which are different, is separately assigned 100%). If the HashMap is not presized, there is a cost of approximately another 30% to 60% on the HashMap inserts (i.e., 30% to 60% longer in time). The following chart shows the times using Sun VM Version 1.2 with JIT (the ratios vary under other VMs):
You can see that you need to gain an order of magnitude to catch up to the HashMap performance. Profiling is not a huge help; it says only that you need to improve the times on these few methods you have. So you need to target basics. First, by using a char array at the beginning, get rid of the overhead of accessing the characters from the string key one at a time through the string accessor. Also, by passing the length as a parameter, remove the overhead of repeatedly accessing the string size through its length( ) method. At the same time, rather than create a new char array for each string every time you insert or search the tree, repeatedly use the same char buffer. The new classes look like this: public class TernarySearchTree { TernarySearchTreeNode root; char[ ] buff = new char[5000]; public Object get(String key) { key.getChars(0, key.length( ), buff, 0); return TernarySearchTreeNode1.search(root, buff, 0, key.length( )-1); } public Object put(String key, Object value) { key.getChars(0, key.length( ), buff, 0); root = TernarySearchTreeNode.insert(root, buff, 0, key.length( )-1, value); return null; //fake it for now } } class TernarySearchTreeNode { char splitchar; TernarySearchTreeNode low; TernarySearchTreeNode high; TernarySearchTreeNode equal; Object value; public static Object search(TernarySearchTreeNode p, char[ ] str, int strIdx, int strMaxIdx) { char c; if (p = = null) return null; else if ( (c=str[strIdx]) < p.splitchar) return search(p.low, str, strIdx, strMaxIdx); else if (c > p.splitchar) return search(p.high, str, strIdx, strMaxIdx); else { if (strIdx = = strMaxIdx) return p.value; else return search(p.equal, str, strIdx+1, strMaxIdx); } } public static TernarySearchTreeNode insert(TernarySearchTreeNode p, char[ ] str, int strIdx, int strMaxIdx, Object o) { char c; if (p = = null) { p = new TernarySearchTreeNode(str[strIdx]); } if ( (c = str[strIdx]) < p.splitchar) p.low = insert(p.low, str, strIdx, strMaxIdx, o); else if (c = = p.splitchar) { if (strIdx = = strMaxIdx) p.value = o; else p.equal = insert(p.equal, str, strIdx+1, strMaxIdx, o); } else p.high = insert(p.high , str, strIdx, strMaxIdx, o); return p; } public TernarySearchTreeNode(char c) { splitchar = c; } } The algorithms all stay the same; we have applied only the most basic tuning. The following table illustrates the measured values:
Well, it's a little better, but it should be a lot better. Bentley and Sedgewick suggest one obvious improvement: changing the recursive algorithms to iterative ones. This is a standard tuning technique, but it can be difficult to achieve. You can use the implementations here as a sort of template: look at how the recursion has been converted into iteration by having a "current" node, p, which is changed on each pass. This is the normal way of moving from recursion to iteration (see also Section 7.5 and Section 7.6 in Chapter 7). The classes now look like: public class TernarySearchTree { TernarySearchTreeNode root; char buff[ ] = new char[5000]; public Object get(String key) { if(root = = null) return null; else { key.getChars(0, key.length( ), buff, 0); return root.search(buff, 0, key.length( ) - 1); } } public Object put(String key, Object obj) { key.getChars(0, key.length( ), buff, 0); if(root = = null) root = new TernarySearchTreeNode(buff[0]); return root.insert(buff, 0, key.length( ) - 1, obj); } } class TernarySearchTreeNode { char splitchar; TernarySearchTreeNode low; TernarySearchTreeNode high; TernarySearchTreeNode equal; Object value; public Object search(char str[ ], int strIdx, int strMaxIdx) { char c; for(TernarySearchTreeNode p = this; p != null;) { if((c = str[strIdx]) < p.splitchar) p = p.low; else if(c = = p.splitchar) { if(strIdx = = strMaxIdx) return p.value; strIdx++; p = p.equal; } else p = p.high; } return null; } public Object insert(char str[ ], int strIdx, int strMaxIdx, Object o) { TernarySearchTreeNode p = this; char c; while(true) { if ( (c = str[strIdx]) < p.splitchar) { if(p.low = = null) p.low = new TernarySearchTreeNode(c); p = p.low; } else if(c = = p.splitchar) { if(strIdx = = strMaxIdx) { Object old = p.value; p.value = o; return old; } strIdx++; c = str[strIdx]; if(p.equal = = null) p.equal = new TernarySearchTreeNode(c); p = p.equal; } else { if(p.high = = null) p.high = new TernarySearchTreeNode(c); p = p.high; } } } public TernarySearchTreeNode(char c) { splitchar = c; } } The iterative implementation of insert( ) allows you to return the old object easily, thus making the implementation of put( ) have correct functionality for a Map. The following table illustrates the resulting measurements (the previous measurements are in parentheses):
The performance has improved, but it's still a long way from the HashMap performance. It is worth noting that these simple optimizations have already cut the times by over a third. To get a big boost, you need to target something large. Object creation can be a serious performance problem, as we saw in Chapter 4. Bentley and Sedgewick state that their major performance optimization is to preallocate memory for the tree. So let's change the node creation in the insert call to assign nodes from a precreated pool of nodes, i.e., create a large pool of nodes at the initial creation of TernarySearchTree, and assign these as required in the tree insert( ) method. The code is straightforward, replacing the new TernarySearchTreeNode( ) call with a newNode( ) call, and the pool management is simple: public static TernarySearchTreeNode newNode(char c, TernarySearchTreeNode pool) { TernarySearchTreeNode p = pool; pool = pool.equal; p.splitchar = c; p.low = p.equal = p.high = null; return p; } TernarySearchTreeNode static createPool(int size) { TernarySearchTreeNode last; TernarySearchTreeNode pool; for (int i = size; i > 0; i--) { last = pool; pool = new TernarySearchTreeNode( ); pool.equal=last; } return pool; } The following chart shows the new measurements (previous measurements in parentheses):
We are getting closer for the average (random) case. But why is there such a discrepancy between the average and worst (mostly-sorted) case now, and why is there an improvement in the access times when the change should have altered only the insertion times? The discrepancy between the average and worst cases may indicate that the worst times are a result of the time spent in the stack due to the depth of the insert/search loops rather than node creation. The improvement in the access times may be due to internal memory management of the VM: all the nodes being created one after the other may be the reason for the improved access. (Although since everything is in memory, I'm not quite sure why this would be so. Possibly, the heap space is segmented according to requisition from the OS, and you may have paged slightly to disk, which could explain the discrepancy. But I have discarded any timings that had any significant paging, so the discrepancy is not entirely clear.) There is an extra issue now, as you have not been measuring the time taken in creating the tree. The last optimization has increased this time, as the nodes were previously created during the insert, but are now all created at tree creation. Consequently, you might now need to consider this creation time, depending on how the data structure is used. The creation time is actually rather large. It would be nice if there were a way to create all nodes required in one VM call, but there is none provided in Java (at least up to JDK 1.4). You can finesse this shortcoming by implementing your own memory management using arrays, and it is certainly worth doing so as an exercise to see if the technique gives any advantages. Another possibility is to create objects at initialization in a separate thread, but this requires previous knowledge of many things, so I will not consider that option here. Fortunately, the node structure is simple, so you can map it into an array of ints. Each node takes five indexes: the first to hold the character, the next three to hold index values for other nodes, and the last to hold an index value into an Object array (which can hold the Object values). Now you no longer have the separate node class; all management is in the TernarySearchTree class: public class TernarySearchTree { //offsets into the array for each node final static int INITIAL_NODE = 1; final static int LOW_OFFSET = 1; final static int HIGH_OFFSET = 2; final static int EQUAL_OFFSET = 3; final static int VALUE_OFFSET = 4; final static int NODE_SIZE = 5; //A buffer for the string char[ ] buff = new char[5000]; //the array of node 'object's int[ ] nodes; //the array of Object values, one for each node Object[ ] objects; //The index to the next available unused node. //Note that it is at index 1, //not zero int nextNode = INITIAL_NODE; //The index to the next object int nextObject = 1; //default constructor public TernarySearchTree( ) { this(500000); } //Constructor to create a pre-sized Ternary tree public TernarySearchTree(int size) { //create all the nodes. //Each node is five int indexes and one Object index nodes = new int[NODE_SIZE*size]; objects = new Object[size]; } public Object get(String key) { key.getChars(0, key.length( ), buff, 0); return search(buff, 0, key.length( )-1); } public Object put(String key, Object value) { key.getChars(0, key.length( ), buff, 0); if (nextNode = = INITIAL_NODE) { nodes[INITIAL_NODE] = buff[0]; nextNode+=NODE_SIZE; } return insert(buff, 0, key.length( )-1, value); } /** * The node search and insert methods just map from the previous * implementations using the mappings * p.splitchar -> nodes[p] * p.low -> nodes[p+LOW_OFFSET] * p.high -> nodes[p+HIGH_OFFSET] * p.equal -> nodes[p+EQUAL_OFFSET] * p.value -> objects[nodes[p+VALUE_OFFSET]] */ public Object search(char[ ] str, int strIdx, int strMaxIdx) { int p = INITIAL_NODE; int c; while (p != 0) { if ( (c = str[strIdx]) < nodes[p]) p = nodes[p+LOW_OFFSET]; else if (c = = nodes[p]) { if (strIdx = = strMaxIdx) return objects[nodes[p+VALUE_OFFSET]]; else { strIdx++; p = nodes[p+EQUAL_OFFSET]; } } else p = nodes[p+HIGH_OFFSET]; } return null; } public Object insert(char[ ] str, int strIdx, int strMaxIdx, Object o) { int p = INITIAL_NODE; int c = str[strIdx]; Object old; for(;;) { if ( c < nodes[p]) { if (nodes[p+LOW_OFFSET] = = 0) { nodes[p+LOW_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+LOW_OFFSET]; } else if (c = = nodes[p]) { if (strIdx = = strMaxIdx) { if (nodes[p+VALUE_OFFSET] = = 0) { nodes[p+VALUE_OFFSET] = nextObject; nextObject++; } old = objects[nodes[p+VALUE_OFFSET]]; objects[nodes[p+VALUE_OFFSET]] = o; return old; } else { strIdx++; c=str[strIdx]; if (nodes[p+EQUAL_OFFSET] = = 0) { nodes[p+EQUAL_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+EQUAL_OFFSET]; } } else { if (nodes[p+HIGH_OFFSET] = = 0) { nodes[p+HIGH_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+HIGH_OFFSET]; } } } } Although the class may look a little complex, it is pretty easy to see what is happening if you bear in mind that nodes[p+HIGH_OFFSET] is equivalent to the p.high in the previous version of the tree (i.e., the tree that had the separate node class). There are only two slightly more complex differences. First, the equivalent of p.value is now objects[nodes[p+VALUE_OFFSET]] because the nodes array holds only ints, and the value can be any Object (hence requiring a separate Object array). Second, a new node is allocated by providing the index of the current high-water mark in the nodes array, held by variable nextNode. This index is then incremented by the size of the node, NODE_SIZE (which is five fields), for the next node allocation. This alternative implementation does not affect the external interface to the class, so the complexity remains hidden from any other class. This implementation is much closer to the C implementation provided by Bentley and Sedgewick, where nodes were allocated in a similar large chunk. Now the question is, have we improved performance? The next table shows the current measurements (previous measurements in parentheses):
Overall, these are the best results so far, and the worst case is much closer to the average case. Also, the object-creation time is much better: it is essentially as fast as possible in a VM since you are creating just two new significant objects (which are very large arrays), and so the limitation is purely down to how quickly the VM can allocate the memory space. You might be satisfied to stop at this point, even though your structure is slower than a HashMap by a factor of two. It does provide the extra required functionality of partial matching, as well as the full matching that HashMaps provide, and relative performance is acceptable. But there is one more major change to consider. You know that digital search tries are extremely fast, but inefficient in space. If you are prepared to accept the extra space taken, you can still consider using digital tries to achieve improved performance. If you are using strings that contain mainly the ASCII character set, consider using a digital search trie for the first couple of characters. A two-node digital search trie has 256 nodes for a one-character string, and 256 nodes for the first character in multicharacter strings. For the multicharacter strings, the second node has 256 nodes for each node of the first character, giving 256 x 257 = 65,792 nodes. With each node using 4 bytes, you would use 65792 x 4 = 263,168 bytes. So you have a quarter of a megabyte before you even start to use this structure. However, if you use this structure for a large amount of string data, you may find this memory usage small compared to the final overall size. Assuming this is acceptable, let's look at how it is implemented and how it performs. Basically, you implement a trie for the first two characters, but each two-character node then points to the root of a ternary tree. The two-digit trie needs a parallel Object structure to store the Object values that correspond to one- or two-digit strings. This is, of course, occupying a lot of space, and there are methods for reducing the space requirements (for example, you can optimize for just the alphanumeric characters, mapping them into smaller arrays), but for this exercise, let's keep it simple. The class now looks like this: public class TernarySearchTree { final static int LOW_OFFSET = 1; final static int HIGH_OFFSET = 2; final static int EQUAL_OFFSET = 3; final static int VALUE_OFFSET = 4; final static int NODE_SIZE = 5; final static int INITIAL_TRIE_NODE = 1 + NODE_SIZE; final static int INITIAL_NODE = 1; char[ ] buff = new char[5000]; int[ ] nodes; Object[ ] objects; int nextNode = INITIAL_TRIE_NODE; int nextObject = 0; int initial = -1; Object[ ] trie1Objects; int[ ][ ] trie2; Object[ ][ ] trie2Objects; public TernarySearchTree( ) { this(500000); } public TernarySearchTree(int size) { trie1Objects = new Object[256]; trie2 = new int[256][256]; trie2Objects = new Object[256][256]; nodes = new int[NODE_SIZE*size+1]; objects = new Object[size]; } public Object get(String key) { int len = key.length( ); key.getChars(0, len, buff, 0); int first = buff[0]; int second = buff[1]; if (len = = 1 && (first < 256)) { return trie1Objects[first]; } else if (len = = 2 && (first < 256) && (second < 256)) { return trie2Objects[first][second]; } else if ((first < 256) && (second < 256)) { int nodep = trie2[first][second]; if (nodep = = 0) { return null; } return search(buff, 2, len-1, nodep); } else { //Use node[0] as a flag to determine if entered here if (nodes[0] = = 0) { return null; } return search(buff, 0, len-1, INITIAL_NODE); } } public void release( ) { nodes = null; objects = null; } public Object search(char[ ] str, int strIdx, int strMaxIdx, int p) { int c; while (p != 0) { if ( (c = str[strIdx]) < nodes[p]) p = nodes[p+LOW_OFFSET]; else if (c = = nodes[p]) { if (strIdx = = strMaxIdx) return objects[nodes[p+VALUE_OFFSET]]; else { strIdx++; p = nodes[p+EQUAL_OFFSET]; } } else p = nodes[p+HIGH_OFFSET]; } return null; } public Object put(String key, Object value) { int len = key.length( ); key.getChars(0, len, buff, 0); int first = buff[0]; int second = buff[1]; if (len = = 1 && (first < 256)) { Object old = trie1Objects[first]; trie1Objects[first] = value; return old; } else if (len = = 2 && (first < 256) && (second < 256)) { Object old = trie2Objects[first][second]; trie2Objects[first][second] = value; return old; } else if ((first < 256) && (second < 256)) { int nodep = trie2[first][second]; if (nodep = = 0) { nodep = trie2[first][second] = nextNode; nodes[nextNode] = buff[2]; nextNode+=NODE_SIZE; } return insert(buff, 2, len-1, value, nodep); } else { //Use node[0] as a flag to determine if entered here if (nodes[0] = = 0) { nodes[0] = 1; nodes[INITIAL_NODE] = first; } return insert(buff, 0, len-1, value, INITIAL_NODE); } } public Object insert(char[ ] str, int strIdx, int strMaxIdx, Object value, int p) { int c = str[strIdx]; int cdiff; Object old; for(;;) { if ( (cdiff = c - nodes[p]) < 0) { if (nodes[p+LOW_OFFSET] = = 0) { nodes[p+LOW_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+LOW_OFFSET]; } else if (cdiff = = 0) { if (strIdx = = strMaxIdx) { if (nodes[p+VALUE_OFFSET] = = 0) { nodes[p+VALUE_OFFSET] = nextObject; nextObject++; } old = objects[nodes[p+VALUE_OFFSET]]; objects[nodes[p+VALUE_OFFSET]] = value; return old; } else { strIdx++; c=str[strIdx]; if (nodes[p+EQUAL_OFFSET] = = 0) { nodes[p+EQUAL_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+EQUAL_OFFSET]; } } else { if (nodes[p+HIGH_OFFSET] = = 0) { nodes[p+HIGH_OFFSET] = nextNode; nodes[nextNode] = c; nextNode+=NODE_SIZE; } p = nodes[p+HIGH_OFFSET]; } } } } This table shows the measurements (previous measurements in parentheses):
The results are the best yet for all values except the random access, which is roughly the same as before. Perhaps the most interesting aspect is that you now get better times on the mostly-sorted input than on the randomized input (which is also the case for the HashMap). The result is still slower than a HashMap, but has the extra capability to identify partial matches efficiently. For more specialized versions, such as those needed for a particular application, you can make an implementation that is significantly faster (just as for the hash-table structures earlier in this chapter). All in all, we've taken a particular structure in its initial form, optimized it using various techniques, and made it two to eight times faster accessing and updating elements. Note that there are also costs beyond the extra space costs for this last hybrid structure. The implementation before this last one is still a pure ternary tree. That pure implementation has some elegant and simple recursive algorithms for iterating through the tree in order and for identifying partial matches. However, implementing the equivalent algorithms for the last hybrid structure is quite a bit more complicated, as you have to jump between the two structures it uses. There is not much educational value in proceeding further with these classes here. We've covered the uses of different structures and how to reimplement classes to use different underlying structures for the purpose of improving performance. This book is not intended to provide finished components, so if you feel that this structure may be useful to you in some situation, you'll need to complete it yourself. Just a few final performance notes about the optimized class. Obviously, you want to optimize its use of space. So note that its size is given by the high-water mark, which is easily determined. And if you want to make the class dynamically growable at large sizes, you may be better off catching the exception thrown when the high-water mark grows past the end of the nodes array and then copying to a larger array, rather than making a test on each insertion. |