2.3 Implementing Set Difference
2.3.1 Problem
You want to compute the difference
between two sets. For example, you
want to find all the courses Andrew has taken that Cindy has not yet
taken. As a variation on this problem, you also want to subtract one
set from the combination of all other sets. In other words, Cindy
wants to find out how her friends are doing by listing all term
papers that they have completed that she has not.
2.3.2 Solution
There are two problems in this recipe. The first is a simple
subtraction of one set from another. The second is a subtraction of
one set from the union of all other sets. The solution to both is
really one and the same because the union of more than one set is
simply a larger set.
2.3.2.1 Subtracting one set from another
Consider the problem of finding out
which
term papers Andrew has completed that Cindy has not completed. There
are two sets involved: the set of papers that Andrew has completed
and the set of papers that Cindy has completed. The following query
returns the difference between these two sets:
SELECT s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName='Andrew' AND
NOT EXISTS(
SELECT * FROM Students s1
WHERE s1.CourseId=s.CourseId AND
s1.TermPaper=s.TermPaper AND
s1.StudentName='Cindy')
The results returned by this query will look like this:
CourseId TermPaper
--------- -----------
ACCN101 4
ACCN101 3
2.3.2.2 Subtracting one set from all others
A slight variation on this problem is to subtract
one specific set from the union of
all other sets. This leads to the second part of our
problem — that of finding which term papers
Cindy's friends have taken, but not Cindy. The
following query will do this:
SELECT s.StudentName, s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName<>'Cindy' AND
NOT EXISTS(
SELECT * FROM Students s1
WHERE s.CourseId=s1.CourseId AND
s.TermPaper=s1.TermPaper AND
s1.StudentName='Cindy')
This query returns the following result:
StudentName CourseId TermPaper
------------- --------- -----------
Andrew ACCN101 4
Andrew ACCN101 3
Bert ACCN101 3
Now we know that of all the people in school, only Andrew and Bert
have turned in term papers that Cindy hasn't written
yet.
2.3.3 Discussion
Finding the difference between two sets requires the
use of subqueries and is usually an
expensive task in terms of CPU and memory consumption. The key to
solving such a problem lies in defining the sets involved. For
example, the set of term papers Cindy has taken is one set that
appears in both problems posed in this recipe. Once
you've defined your sets, you
can use a SELECT statement with a NOT
EXISTS predicate to return the difference between those sets.
2.3.3.1 Subtracting one set from another
The first SELECT statement subtracts the set of term papers written
by Cindy from the set written by Andrew. The outer SELECT statement
returns all term papers written by Andrew. The NOT EXISTS predicate
modifies the results of that query so that they exclude all term
papers taken by Cindy; the SELECT statement within the predicate
defines Cindy's set of term papers. The result
returned by the query as a whole is the list of term papers written
by Andrew, but not yet by Cindy.
2.3.3.2 Subtracting one set from all others
The second SELECT statement subtracts the set of term papers written
by Cindy from each of the other sets. The skeleton of the solution is
the same as for the prior problem. The only difference is that the
set defined by the outer SELECT statement includes term papers for
all students but Cindy.
The result returned by the second SELECT statement is a list of all
term papers turned in by other students, but not yet by Cindy.
Because several students can work on the same term paper, the student
name is included in the results to give them meaning. Cindy wants to
know who has submitted papers that she
hasn't.
2.3.3.3 Subtracting other sets from one
The second query actually leads to a solution for yet a third
possibility for computing set differences — that of subtracting
many sets from one. Say that you want a list of papers written by
Andrew, but which have not yet been written by any other student. You
could generate that using the following query:
SELECT s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName='Andrew' AND
NOT EXISTS(
SELECT * FROM Students s1
WHERE s1.CourseId=s.CourseId AND
s1.TermPaper=s.TermPaper AND
s1.StudentName<>'Andrew')
In this case, the outer SELECT statement returns the list of term
papers written by Andrew. That's the starting point.
The subquery then defines the list of term papers taken by other
students. The two lists are subtracted, and the result is a
list of term papers that only Andrew has submitted.
|