2.11 Including Nonaggregated Columns
2.11.1 Problem
You want to write a GROUP BY query
that returns summarized data, and
you want to also include nonaggregated columns in the result. These
nonaggregated columns do not appear in the GROUP BY clause. With
respect to the students example, let's say that each
course is graded according to the best term paper that each student
has submitted for that course. For administrative reasons, you must
find out which term paper has the best Score for each student/course
combination.
2.11.2 Solution
The following query shows each student's highest
term paper Score in each class that they took. An inline query is
also used to return the specific term paper number that corresponds
to that Score.
SELECT StudentName,CourseId,
(SELECT MAX(TermPaper)
FROM Students
WHERE Score=MAX(s.Score)and
StudentName=s.StudentName and
CourseId=s.CourseId) TermPaper,
MAX(s.Score) Score
FROM Students s
GROUP BY CourseId, StudentName
The output from this query will be as follows:
StudentName CourseId TermPaper Score
------------ --------- ---------- ------
Andrew ACCN101 4 15.60
Andrew MGMT120 3 23.10
Bert ACCN101 1 13.40
Cindy ACCN101 2 16.20
Cindy MGMT120 3 16.00
2.11.3 Discussion
This recipe highlights an interesting problem that has been often
debated by SQL programmers. The problem is that when an extreme has
been identified using an aggregation such as MAX, SQL insists that
the SELECT list contain only the result of the aggregation and the
columns from the GROUP BY clause. In real life, many programmers have
wished for the ability to include nonaggregated columns as well. They
not only want to know the maximum term paper Score for a course, but
they want to know the number of that term paper. In other words, they
want to write a SQL statement such as this:
SELECT StudentName, CourseId, TermPaper, MAX(Score)
FROM Students s
GROUP BY CourseId, StudentName
This kind of query is invalid. The problem is that it can be executed
only if the TermPaper column is added to the GROUP BY clause. We
potentially face the same problem when writing our query, but
we've avoided it by writing an inline SELECT
statement to retrieve the term paper number corresponding to the high
Score.
The outer query groups the table into sets based on student names and
course identification. Then it finds the best Score for each student
in course. That's easily understood. The inner query
is where things get interesting. It retrieves the number of a term
paper where the Score matches the high Score for the course by the
student in question. The query is a correlated subquery, meaning that
it's executed once for each course and student
combination. It's possible for a student to have two
term papers in a course with the same Score. To ensure that only one
value is returned, the inner query uses the MAX function on the
TermPaper column. Anytime two term papers tie for the highest Score,
the term paper listed in the results will be the one with the highest
number. This is a rather arbitrary choice, but it's
the best you can do under the circumstances.
|