2.9 Aggregating Aggregates
2.9.1 Problem
You want to select some data, aggregate it, and
then
aggregate it again. Here's an example of this type
of problem: the administration at the university is preparing an
internal report for the dean, who wants to compare grading habits of
professors. One of the measures the dean wants to look at is the
spread between the average term paper Score for each student in a
given course. The spread is the difference between the best and the
worst student Score per course. Your job is to find the best and the
worst average Score in each course and calculate the difference.
2.9.2 Solution
Use a nested SELECT statement in the FROM clause of your query to
generate the first aggregation. Then write your enclosing SQL
statement so that it takes those aggregated results and aggregates
them again. For example:
SELECT CourseId, MAX(l.s) Best ,MIN(l.s) Worst,
MAX(l.s)-MIN(l.s) Spread
FROM (
SELECT CourseId, AVG(Score) AS s
FROM Students
GROUP BY CourseId, StudentName) AS l
GROUP BY CourseId
The query will return the following result:
CourseId Best Worst Spread
--------- ------ ------ ---------
ACCN101 14.15 12.33 1.81
MGMT120 21.66 14.16 7.50
2.9.3 Discussion
SQL does not allow you to directly enclose one aggregate function
within another. In other words, a query written in the following
manner would not execute:
SELECT CourseId, MAX(AVG(stock)), MIN(AVG(stock))
FROM Students
GROUP BY CourseId, studentsName
However, you can work around this by using a result from one query as
the source for a second query. This trick can be used as a general
solution for problems where you have to aggregate already aggregated
data. In the case of the solution shown in this recipe, the innermost
query takes all the term paper Scores for each student in each course
and returns the average of those Scores. The intermediate results
look like this:
CourseId s
-------------------- ---------
ACCN101 12.333333
MGMT120 21.666666
ACCN101 12.533333
ACCN101 14.150000
MGMT120 14.166666
The intermediate results are summarized by course and student. The
outermost query then summarizes those results, but this time at the
course level. This allows the use of the MAX function to compute the
highest average student Score in a course, while the MIN function is
used to compute the lowest average student Score in a course.
Please note that the query shown in this solution is expensive,
because it requires two scans of the table — one for the inner
query and one for the outer query. In addition, the ability to nest a
SELECT statement in the FROM clause is a recent addition to the SQL
standard. Older versions of SQL Server may not support it.
|