Team LiB   Previous Section   Next Section

2.5 Implementing Partial Intersection

2.5.1 Problem

You have a set of sets. You want to find the elements that represent intersections between those sets, and then you want to count the number of sets to which each of those elements belongs. The intersections can be partial. In other words, it is not necessary for an element to be present in all sets. However, you do want to specify a threshold, in terms of the number of sets, so the query results will exlude an element that falls below it. As an example of this type of problem, you want to list all term papers and show how many students have submitted each one of them.

2.5.2 Solution

The solution to this problem is fairly trivial. You have to count the number of term papers that occurs in the table, so you can use a GROUP BY query as shown in the following example:

SELECT CourseId, TermPaper, COUNT(*) NumStudents
FROM Students
GROUP BY TermPaper, CourseId
ORDER BY  COUNT(*) DESC

This query will return the submission count for each term paper. If you want to impose a threshold — say that you only care about term papers that have been turned in by at least two students — you can add a HAVING clause as follows:

SELECT CourseId, TermPaper, COUNT(*) NumStudents
FROM Students
GROUP BY TermPaper, CourseId
HAVING COUNT(*) >= 2
ORDER BY COUNT(*) DESC

Run against the sample data provided with this chapter, this query will produce the following result:

CourseId             TermPaper   NumStudents 
-------------------- ----------- ----------- 
ACCN101              2           3
ACCN101              1           2
ACCN101              3           2
MGMT120              1           2
MGMT120              2           2
MGMT120              3           2

2.5.3 Discussion

The GROUP BY query takes the rows from the Students table and sorts them into groups based on term papers. Term papers are uniquely identified by a combination of course ID and term paper number. The COUNT(*) in the SELECT list causes the rows in each group to be counted. The result is the number of times each paper has been submitted.

Strictly speaking, if a count is less than 2, then it doesn't represent an intersection at all. Either nobody has written the term paper yet or only one person has written it. If you need a mathematically correct intersection, specify a minimum count of 2 in the HAVING clause. That way you will only see term papers that fall into at least two sets. The following query, for example, returns the intersection between the set of term papers written by Andrew and the set written by Cindy:

SELECT CourseId, TermPaper
FROM Students
WHERE StudentName IN ('Andrew','Cindy')
GROUP BY TermPaper, CourseId
HAVING COUNT(*) >= 2
ORDER BY  COUNT(*) DESC

The way that this works is that the WHERE clause restricts the query to only those papers written by either Andrew or Cindy. The GROUP BY then sorts and counts by paper. A count of 2 indicates that both Andrew and Cindy must have submitted a paper. The HAVING clause further restricts the query's results to only those term papers.

All of the queries shown in this recipe are predicated on the assumption that a student cannot submit the same term paper twice or, at least, that such a double submission will not be recorded twice in the Students table.

    Team LiB   Previous Section   Next Section