4.9 Aggregating Hierarchies
4.9.1 Problem
You want to aggregate hierarchical data. For example,
you
wish to sum the cost of a project or task, beginning from a specific
vertex and working your way down through all levels of the hierarchy
underneath that vertex.
4.9.2 Solution
With respect to the projects example that we have been using for
these general hierarchy recipes, you can use the following stored
procedure to summarize the costs of all subprojects for a specific
project. You specify the project by passing its vertex ID as a
parameter to the procedure.
CREATE PROCEDURE AggregateProjects
@VertexId INTEGER
AS
SET NOCOUNT ON
DECLARE @lvl INTEGER
DECLARE @Name VARCHAR(20)
DECLARE @Cost INTEGER
DECLARE @Sum INTEGER
CREATE TABLE #stack (
VertexId INTEGER,
Name VARCHAR(20),
Cost INTEGER,
Lvl INTEGER
)
SELECT @Sum=0
SELECT @Lvl = 1
INSERT INTO #stack
SELECT VertexId, Name, Cost, 1 FROM Projects
WHERE VertexID=@VertexId
WHILE @Lvl > 0 BEGIN
IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN
SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost
FROM #stack WHERE lvl = @lvl
ORDER BY VertexId
/* to change action each vertex, change this line */
SELECT @Sum=@Sum+@Cost
/* ******* */
DELETE FROM #stack WHERE vertexId = @VertexId
INSERT #stack
SELECT VertexId, Name, Cost, @lvl + 1
FROM Projects WHERE parent = @VertexId
IF @@ROWCOUNT > 0
SELECT @lvl = @lvl + 1
END ELSE
SELECT @lvl = @lvl - 1
END
PRINT 'Sum of project costs is '+STR(@Sum)+'.'
DROP TABLE #stack
SET NOCOUNT OFF
The following examples show the results of executing this procedure
for all projects (VertexId=1) and for the Specifications project
(VertexId=2).
AggregateProjects 1
Sum of project costs is 231.
AggregateProjects 2
Sum of project costs is 33.
4.9.3 Discussion
The algorithm shown here is adapted from the SQL Server Books Online
recommendation for expanding hierarchies. Those familiar with
algorithms and data structures will notice that it is a nonrecursive
implementation of a recursive traversing algorithm. The code uses an
internal stack implemented as a temporary table. The stack holds
interesting information from the hierarchical Projects table, plus an
additional column named lvl. The lvl column records the level that
each entry holds in the hierarchy. The stack definition is as
follows:
CREATE TABLE #stack (
VertexId INTEGER,
Name VARCHAR(20),
Cost INTEGER,
Lvl INTEGER
)
After the #stack table is created, two variables are initialized. The
@lvl variable tracks the current level on which the code is
operating, while the @sum variable accumulates the sum of all costs.
Next, an INSERT statement is used to store the root onto the stack.
Note that the root in our case is the vertex identified by @VertexId.
The code then loops through each element on the stack. The loop
begins by popping one element from the stack. The retrieved data
contains a cost, which is added to the total cost being accumulated
in @sum:
SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost
FROM #stack WHERE lvl = @lvl
ORDER BY VertexId
SELECT @Sum=@Sum+@Cost
After accumulating the cost for the vertex just pulled from the
stack, the code deletes that read vertex and adds all of its children
to the stack:
DELETE FROM #stack WHERE vertexId = @VertexId
INSERT #stack
SELECT VertexId, Name, Cost, @lvl + 1
FROM Projects WHERE parent = @VertexId
IF @@ROWCOUNT > 0
SELECT @lvl = @lvl + 1
The IF statement that you see in this code ensures that if the vertex
just deleted from the stack has any children, the @lvl value is
increased to indicate that the code is moving one level down in the
hierarchy.
The IF EXISTS clause at the beginning of the loop ensures that so
long as there are some candidates available on the current level, the
loop repeats and browses through them all. The SET NOCOUNT ON
directive at the beginning of the procedure just limits the number of
messages displayed from the procedure. It does not affect the logic
of the algorithm. Without SET NOCOUNT ON, you'll see
a steady stream of "(1 row(s)
affected)" messages as the code executes.
This algorithm is general enough that it can be used for any kind of
operation for which you prefer traversing the hierarchy in a
nonrecursive manner. If you want to change the operation that is
performed on each node, change the code where current cost is added
to the total sum.
The code demonstrates why the general model for hierarchies has a
limited application in SQL. When you need to traverse over more than
one level efficiently, the code starts to expand to an almost
unreadable size.
|