Team LiB   Previous Section   Next Section

4.9 Aggregating Hierarchies

4.9.1 Problem

You want to aggregate hierarchical data. For example, you wish to sum the cost of a project or task, beginning from a specific vertex and working your way down through all levels of the hierarchy underneath that vertex.

4.9.2 Solution

With respect to the projects example that we have been using for these general hierarchy recipes, you can use the following stored procedure to summarize the costs of all subprojects for a specific project. You specify the project by passing its vertex ID as a parameter to the procedure.

CREATE PROCEDURE AggregateProjects
@VertexId INTEGER
AS
SET NOCOUNT ON
   DECLARE @lvl INTEGER
   DECLARE @Name VARCHAR(20) 
   DECLARE @Cost INTEGER
   DECLARE @Sum INTEGER

   CREATE TABLE #stack (
      VertexId INTEGER, 
      Name VARCHAR(20),
      Cost INTEGER,
      Lvl INTEGER
   )
   
   SELECT @Sum=0
   SELECT @Lvl = 1
   
   INSERT INTO #stack 
      SELECT VertexId, Name, Cost, 1 FROM Projects 
      WHERE VertexID=@VertexId  
   
   WHILE @Lvl > 0 BEGIN
      IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN

         SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost 
            FROM #stack WHERE lvl = @lvl
            ORDER BY VertexId

         /* to change action each vertex, change this line */
         SELECT @Sum=@Sum+@Cost
         /* ******* */

         DELETE FROM #stack WHERE vertexId = @VertexId

         INSERT #stack
            SELECT VertexId, Name, Cost, @lvl + 1 
            FROM Projects WHERE parent = @VertexId

         IF @@ROWCOUNT > 0
            SELECT @lvl = @lvl + 1
      END ELSE 
         SELECT @lvl = @lvl - 1
      
   END
   PRINT 'Sum of project costs is '+STR(@Sum)+'.'
   DROP TABLE #stack
SET NOCOUNT OFF

The following examples show the results of executing this procedure for all projects (VertexId=1) and for the Specifications project (VertexId=2).

AggregateProjects 1
Sum of project costs is         231.

AggregateProjects 2
Sum of project costs is         33.

4.9.3 Discussion

The algorithm shown here is adapted from the SQL Server Books Online recommendation for expanding hierarchies. Those familiar with algorithms and data structures will notice that it is a nonrecursive implementation of a recursive traversing algorithm. The code uses an internal stack implemented as a temporary table. The stack holds interesting information from the hierarchical Projects table, plus an additional column named lvl. The lvl column records the level that each entry holds in the hierarchy. The stack definition is as follows:

CREATE TABLE #stack (
      VertexId INTEGER, 
      Name VARCHAR(20),
      Cost INTEGER,
      Lvl INTEGER
   )

After the #stack table is created, two variables are initialized. The @lvl variable tracks the current level on which the code is operating, while the @sum variable accumulates the sum of all costs. Next, an INSERT statement is used to store the root onto the stack. Note that the root in our case is the vertex identified by @VertexId.

The code then loops through each element on the stack. The loop begins by popping one element from the stack. The retrieved data contains a cost, which is added to the total cost being accumulated in @sum:

SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost 
   FROM #stack WHERE lvl = @lvl
   ORDER BY VertexId

SELECT @Sum=@Sum+@Cost

After accumulating the cost for the vertex just pulled from the stack, the code deletes that read vertex and adds all of its children to the stack:

DELETE FROM #stack WHERE vertexId = @VertexId

INSERT #stack
   SELECT VertexId, Name, Cost, @lvl + 1 
   FROM Projects WHERE parent = @VertexId

IF @@ROWCOUNT > 0
   SELECT @lvl = @lvl + 1

The IF statement that you see in this code ensures that if the vertex just deleted from the stack has any children, the @lvl value is increased to indicate that the code is moving one level down in the hierarchy.

The IF EXISTS clause at the beginning of the loop ensures that so long as there are some candidates available on the current level, the loop repeats and browses through them all. The SET NOCOUNT ON directive at the beginning of the procedure just limits the number of messages displayed from the procedure. It does not affect the logic of the algorithm. Without SET NOCOUNT ON, you'll see a steady stream of "(1 row(s) affected)" messages as the code executes.

This algorithm is general enough that it can be used for any kind of operation for which you prefer traversing the hierarchy in a nonrecursive manner. If you want to change the operation that is performed on each node, change the code where current cost is added to the total sum.

The code demonstrates why the general model for hierarchies has a limited application in SQL. When you need to traverse over more than one level efficiently, the code starts to expand to an almost unreadable size.

    Team LiB   Previous Section   Next Section