Team LiB   Previous Section   Next Section

2.3 Implementing Set Difference

2.3.1 Problem

You want to compute the difference between two sets. For example, you want to find all the courses Andrew has taken that Cindy has not yet taken. As a variation on this problem, you also want to subtract one set from the combination of all other sets. In other words, Cindy wants to find out how her friends are doing by listing all term papers that they have completed that she has not.

2.3.2 Solution

There are two problems in this recipe. The first is a simple subtraction of one set from another. The second is a subtraction of one set from the union of all other sets. The solution to both is really one and the same because the union of more than one set is simply a larger set.

2.3.2.1 Subtracting one set from another

Consider the problem of finding out which term papers Andrew has completed that Cindy has not completed. There are two sets involved: the set of papers that Andrew has completed and the set of papers that Cindy has completed. The following query returns the difference between these two sets:

SELECT s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName='Andrew' AND 
   NOT EXISTS(
      SELECT * FROM Students s1 
      WHERE s1.CourseId=s.CourseId AND 
         s1.TermPaper=s.TermPaper AND 
         s1.StudentName='Cindy')

The results returned by this query will look like this:

CourseId  TermPaper   
--------- ----------- 
ACCN101   4
ACCN101   3
2.3.2.2 Subtracting one set from all others

A slight variation on this problem is to subtract one specific set from the union of all other sets. This leads to the second part of our problem — that of finding which term papers Cindy's friends have taken, but not Cindy. The following query will do this:

SELECT s.StudentName, s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName<>'Cindy' AND
   NOT EXISTS(
      SELECT * FROM Students s1 
      WHERE s.CourseId=s1.CourseId AND 
         s.TermPaper=s1.TermPaper AND 
         s1.StudentName='Cindy')

This query returns the following result:

StudentName   CourseId  TermPaper   
------------- --------- ----------- 
Andrew        ACCN101   4
Andrew        ACCN101   3
Bert          ACCN101   3

Now we know that of all the people in school, only Andrew and Bert have turned in term papers that Cindy hasn't written yet.

2.3.3 Discussion

Finding the difference between two sets requires the use of subqueries and is usually an expensive task in terms of CPU and memory consumption. The key to solving such a problem lies in defining the sets involved. For example, the set of term papers Cindy has taken is one set that appears in both problems posed in this recipe. Once you've defined your sets, you can use a SELECT statement with a NOT EXISTS predicate to return the difference between those sets.

2.3.3.1 Subtracting one set from another

The first SELECT statement subtracts the set of term papers written by Cindy from the set written by Andrew. The outer SELECT statement returns all term papers written by Andrew. The NOT EXISTS predicate modifies the results of that query so that they exclude all term papers taken by Cindy; the SELECT statement within the predicate defines Cindy's set of term papers. The result returned by the query as a whole is the list of term papers written by Andrew, but not yet by Cindy.

2.3.3.2 Subtracting one set from all others

The second SELECT statement subtracts the set of term papers written by Cindy from each of the other sets. The skeleton of the solution is the same as for the prior problem. The only difference is that the set defined by the outer SELECT statement includes term papers for all students but Cindy.

The result returned by the second SELECT statement is a list of all term papers turned in by other students, but not yet by Cindy. Because several students can work on the same term paper, the student name is included in the results to give them meaning. Cindy wants to know who has submitted papers that she hasn't.

2.3.3.3 Subtracting other sets from one

The second query actually leads to a solution for yet a third possibility for computing set differences — that of subtracting many sets from one. Say that you want a list of papers written by Andrew, but which have not yet been written by any other student. You could generate that using the following query:

SELECT s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName='Andrew' AND 
   NOT EXISTS(
      SELECT * FROM Students s1 
      WHERE s1.CourseId=s.CourseId AND 
         s1.TermPaper=s.TermPaper AND 
         s1.StudentName<>'Andrew')

In this case, the outer SELECT statement returns the list of term papers written by Andrew. That's the starting point. The subquery then defines the list of term papers taken by other students. The two lists are subtracted, and the result is a list of term papers that only Andrew has submitted.

    Team LiB   Previous Section   Next Section