4.10 Preparing Multilevel Operations

4.10.1 Problem

Your system performs many multilevel, hierarchical operations, and the performance of those operations has not been satisfactory. You need to improve that poor performance.

4.10.2 Solution

One solution is to store additional accessibility information into a service table, and then use that information when querying your hierarchical table. For example, the following ProjectPaths table records the path to, and the depth of, each vertex in the Projects table:

CREATE TABLE ProjectPaths(
   VertexId INTEGER,
   Depth INTEGER,
   Path VARCHAR(300)
)

After creating the ProjectPaths table, you can use the following procedure to fill the table with the depth and path information for each vertex in the Projects table:

CREATE PROCEDURE BuildProjectPathsRecursive
@VertexId INTEGER
AS
SET NOCOUNT ON
   DECLARE @Path VARCHAR(300)
   DECLARE @Depth INTEGER

   SELECT @Depth=a.Depth,@Path=a.Path
   FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId
   WHERE @vertexId=p.vertexId  
     
   DELETE FROM ProjectPaths WHERE VertexId=@VertexId
   INSERT INTO ProjectPaths VALUES(
         @VertexId, 
         isnull(@Depth,0)+1, 
         isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')

   DECLARE subprojects CURSOR LOCAL FOR
      SELECT VertexId FROM Projects p WHERE Parent=@VertexId        
   OPEN subprojects

      FETCH NEXT FROM subprojects INTO @VertexId
      WHILE @@FETCH_STATUS=0 BEGIN

         EXEC BuildProjectPathsRecursive @VertexId
         FETCH NEXT FROM subprojects INTO @VertexId
      END
   CLOSE subprojects
   DEALLOCATE subprojects
SET NOCOUNT OFF

This procedure takes one parameter, which tells the procedure with which node to start. The procedure then works its way down the hierarchy. To process all nodes in the Projects table, invoke this procedure and pass a value of 1 as follows:

BuildProjectPathsRecursive 1

The procedure fills the ProjectPaths table with additional information for every vertex. The Depth column records the depth of each vertex. The Path column records the path to each vertex. In the path, the vertex numbers are separated by dots. The ProjectPaths table that will be built contains the following rows:

VertexId    Depth       Path
----------- ----------- ----------- 
1           1           .1.
2           2           .1.2.
3           3           .1.2.3.
4           3           .1.2.4.
5           3           .1.2.5.
6           3           .1.2.6.
7           4           .1.2.6.7.
8           2           .1.8.
9           3           .1.8.9.
10          3           .1.8.10.
11          4           .1.8.10.11.
12          3           .1.8.12.
13          2           .1.13.
14          3           .1.13.14.
15          3           .1.13.15.
16          3           .1.13.16.
17          2           .1.17.
18          3           .1.17.18.
19          2           .1.19.

4.10.3 Discussion

The idea for this recipe has been taken from an article published by Itzik Ben-Gan (SQL Server Magazine, June, 2000). His development of this technique is a recent achievement resulting from his search for an ultimate support structure to improve the efficiency of the classical hierarchical model. Although it was originally promoted as an add-on to an existing hierarchy table, we see no reason why you shouldn't normalize properly and separate the hierarchy and its data from the support structure.

The path leading to every vertex is stored in the ProjectPaths table. This represents the work of traversing the hierarchy, and because it is stored in the ProjectPaths table, it only needs to be done once. Please note that the length of the Path field can be changed according to your needs. It does, however, make sense to keep it reasonably small, especially if you want to index it.

The stored procedure named BuildProjectPathsRecursive fills the ProjectPaths table with the paths to each vertex in the subtree. It uses the recursive traversal algorithm introduced earlier in this chapter and runs the following code for each vertex:

SELECT @Depth=a.Depth,@Path=a.Path
FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId
WHERE @vertexId=p.vertexId  
 
DELETE FROM ProjectPaths WHERE VertexId=@VertexId
INSERT INTO ProjectPaths VALUES(
      @VertexId, 
      isnull(@Depth,0)+1, 
      isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')

The SELECT statement reads the depth and path data from the parent. Next, any old information for the vertex is deleted from the ProjectPaths table, and new data is inserted. If the @Depth or @Path variables are null, indicating that no access path for the parent exists, then an initial value of 0 is set for the depth, and an initial value of a dot (.) is set for the path. Regardless of how the depth gets set, it is increased by one. That's because the @Depth variable represents the depth of the current node's parent. You have to increment that depth by 1 to get the current node's depth. Similarly, the @Path variable contains the path to the parent. The current vertex ID is appended onto that path to yield the path to the current node. These new depth and path values are then inserted into the ProjectPaths table.

If you prefer nonrecursive algorithms, you can rewrite the recursive BuildProjectPathsRecursive procedure as a nonrecursive procedure. This code is as follows and uses the stack-based technique shown earlier in the recipe titled Section 4.9:

CREATE PROCEDURE BuildProjectsPaths 
@VertexId INTEGER
AS
SET NOCOUNT ON

   DECLARE @lvl INTEGER

   CREATE TABLE #stack (
      VertexId INTEGER, 
      Lvl INTEGER
   )

   SELECT @Lvl = 1
 
   INSERT INTO #stack 
      SELECT VertexId,1 FROM Projects WHERE VertexId=@VertexID
   
   WHILE @Lvl > 0 BEGIN
      IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN

         SELECT TOP 1 @VertexId = VertexId FROM #stack 
            WHERE lvl = @lvl
            ORDER BY VertexId
   
          DELETE FROM ProjectPaths WHERE VertexId=@VertexId
          INSERT INTO ProjectPaths
            SELECT p.vertexId, 
               isnull(a.Depth,0)+1,
               isnull(a.Path,'.')+CAST(p.VertexId AS VARCHAR(15))+'.'
               FROM ProjectPaths a,Projects p
               WHERE @vertexId=p.vertexId AND p.parent*=a.vertexId

         DELETE FROM #stack WHERE vertexId = @VertexId

         INSERT #stack
            SELECT VertexId, @lvl + 1 FROM Projects 
            WHERE parent = @VertexId

         IF @@ROWCOUNT > 0
            SELECT @lvl = @lvl + 1
      END ELSE 
         SELECT @lvl = @lvl - 1
      
   END


SET NOCOUNT OFF

Maintaining the ProjectPaths Table

Once you create a table like the ProjectsPaths table, how do you maintain it? Some authors recommend that the mechanism to recalculate paths be included into a trigger that is fired whenever a new row is inserted into the hierarchy table. This is a useful recommendation and, depending on your needs, may even be necessary. However, the overhead such a trigger would entail in times of heavy insertion activity might be significant. If you have a table that is updated infrequently and on a batch basis, you may get better overall performance from invoking a procedure such as BuildProjectsPaths, following each batch load.

The advantage of the proposed procedure is that it reinserts the new paths only for the new node that you pass to it and for its possible subtrees. It does not reprocess nodes outside of that hierarchy.

The only thing you need worry about when deleting nodes is that whenever you delete a node from the hierarchy table (Projects), you must also delete the corresponding support row from the service table (ProjectPaths). You can easily achieve this by setting up a cascading delete foreign key on the Projects table:

ALTER TABLE ProjectPaths ADD CONSTRAINT ProjectPaths_FK FOREIGN KEY(VertexId) REFERENCES Projects(VertexId) ON DELETE CASCADE