Team LiB   Previous Section   Next Section

2.9 Aggregating Aggregates

2.9.1 Problem

You want to select some data, aggregate it, and then aggregate it again. Here's an example of this type of problem: the administration at the university is preparing an internal report for the dean, who wants to compare grading habits of professors. One of the measures the dean wants to look at is the spread between the average term paper Score for each student in a given course. The spread is the difference between the best and the worst student Score per course. Your job is to find the best and the worst average Score in each course and calculate the difference.

2.9.2 Solution

Use a nested SELECT statement in the FROM clause of your query to generate the first aggregation. Then write your enclosing SQL statement so that it takes those aggregated results and aggregates them again. For example:

SELECT CourseId, MAX(l.s) Best ,MIN(l.s) Worst, 
   MAX(l.s)-MIN(l.s) Spread 
FROM (
   SELECT CourseId, AVG(Score) AS s 
   FROM Students 
   GROUP BY CourseId, StudentName) AS l
GROUP BY CourseId

The query will return the following result:

CourseId  Best   Worst  Spread                                   
--------- ------ ------ --------- 
ACCN101   14.15  12.33  1.81
MGMT120   21.66  14.16  7.50

2.9.3 Discussion

SQL does not allow you to directly enclose one aggregate function within another. In other words, a query written in the following manner would not execute:

SELECT CourseId, MAX(AVG(stock)), MIN(AVG(stock))
FROM Students
GROUP BY CourseId, studentsName

However, you can work around this by using a result from one query as the source for a second query. This trick can be used as a general solution for problems where you have to aggregate already aggregated data. In the case of the solution shown in this recipe, the innermost query takes all the term paper Scores for each student in each course and returns the average of those Scores. The intermediate results look like this:

CourseId             s        
-------------------- ---------
ACCN101              12.333333
MGMT120              21.666666
ACCN101              12.533333
ACCN101              14.150000
MGMT120              14.166666

The intermediate results are summarized by course and student. The outermost query then summarizes those results, but this time at the course level. This allows the use of the MAX function to compute the highest average student Score in a course, while the MIN function is used to compute the lowest average student Score in a course.

Please note that the query shown in this solution is expensive, because it requires two scans of the table — one for the inner query and one for the outer query. In addition, the ability to nest a SELECT statement in the FROM clause is a recent addition to the SQL standard. Older versions of SQL Server may not support it.

    Team LiB   Previous Section   Next Section