Team LiB   Previous Section   Next Section

2.11 Including Nonaggregated Columns

2.11.1 Problem

You want to write a GROUP BY query that returns summarized data, and you want to also include nonaggregated columns in the result. These nonaggregated columns do not appear in the GROUP BY clause. With respect to the students example, let's say that each course is graded according to the best term paper that each student has submitted for that course. For administrative reasons, you must find out which term paper has the best Score for each student/course combination.

2.11.2 Solution

The following query shows each student's highest term paper Score in each class that they took. An inline query is also used to return the specific term paper number that corresponds to that Score.

SELECT StudentName,CourseId,
   (SELECT  MAX(TermPaper) 
      FROM Students 
      WHERE Score=MAX(s.Score)and 
         StudentName=s.StudentName and 
         CourseId=s.CourseId) TermPaper, 
   MAX(s.Score) Score
FROM Students s
GROUP BY CourseId, StudentName

The output from this query will be as follows:

StudentName  CourseId  TermPaper  Score  
------------ --------- ---------- ------ 
Andrew       ACCN101   4          15.60
Andrew       MGMT120   3          23.10
Bert         ACCN101   1          13.40
Cindy        ACCN101   2          16.20
Cindy        MGMT120   3          16.00

2.11.3 Discussion

This recipe highlights an interesting problem that has been often debated by SQL programmers. The problem is that when an extreme has been identified using an aggregation such as MAX, SQL insists that the SELECT list contain only the result of the aggregation and the columns from the GROUP BY clause. In real life, many programmers have wished for the ability to include nonaggregated columns as well. They not only want to know the maximum term paper Score for a course, but they want to know the number of that term paper. In other words, they want to write a SQL statement such as this:

SELECT StudentName, CourseId, TermPaper, MAX(Score)
FROM Students s
GROUP BY CourseId, StudentName

This kind of query is invalid. The problem is that it can be executed only if the TermPaper column is added to the GROUP BY clause. We potentially face the same problem when writing our query, but we've avoided it by writing an inline SELECT statement to retrieve the term paper number corresponding to the high Score.

The outer query groups the table into sets based on student names and course identification. Then it finds the best Score for each student in course. That's easily understood. The inner query is where things get interesting. It retrieves the number of a term paper where the Score matches the high Score for the course by the student in question. The query is a correlated subquery, meaning that it's executed once for each course and student combination. It's possible for a student to have two term papers in a course with the same Score. To ensure that only one value is returned, the inner query uses the MAX function on the TermPaper column. Anytime two term papers tie for the highest Score, the term paper listed in the results will be the one with the highest number. This is a rather arbitrary choice, but it's the best you can do under the circumstances.

    Team LiB   Previous Section   Next Section