7.10 Joining Arrays with Master Tables
7.10.1 Problem
You want to join a non-normalized
array-like data
table with a normalized master record. The results should still be
non-normalized, but instead of showing coded values from the detail
table, you want them to show the corresponding values from the master
table.
Taking both the StudentThesis table and the Professors table, you
want to join them to replace the member codes with the corresponding
professor names. You want to do it so that the structure of the
result corresponds to the structure of the StudentThesis
table — you want the resulting rows to have one student ID, three
professor names, and one grade.
7.10.2 Solution
The following SELECT statement reproduces the structure from the
non-normalized table (StudentThesis), but replaces the professors ID
numbers with their corresponding professor names. Each line of the
report will list a student ID together with the names of the three
dissertation committee members for that student.
SELECT t.StudentId,
MAX(CASE WHEN t.Member1=p.id
THEN p.Name
ELSE NULL END) Member1,
MAX(CASE WHEN t.Member2=p.id
THEN p.Name
ELSE NULL END) Member2,
MAX(CASE WHEN t.Member3=p.id
THEN p.Name
ELSE NULL END) Member3
FROM StudentThesis t, Professors p
GROUP BY t.StudentId
The results from executing this query will look like these:
StudentId Member1 Member2 Member3
---------- --------------- ---------------- ----------------
1 Baird, Frances Anderson, Terry Newton, Isaac
2 Newton, Isaac Anderson, Terry Smith, John
3 Newton, Isaac Einstein, Albert Base, Samuel
4 Anderson, Terry Smith, John Einstein, Albert
7.10.3 Discussion
The query shown in this recipe is driven by the StudentThesis table,
and it uses aggregation combined with conditional logic to link each
member Id number with its corresponding master record in the
Professors table. You may well wonder about the need for aggregation.
To explain that, let's start by looking at the
following, simpler query:
SELECT t.StudentId, p1.Name, p2.Name, P3.Name, t.Grade
FROM
StudentThesis t JOIN Professors p1 ON t.Member1 = p1.Id
JOIN Professors p2 ON t.Member2 = p2.Id
JOIN Professors p3 ON t.Member3 = p3.Id
This query also returns professor names in place of their Id numbers,
but it does so by joining one instance of the Professors table for
each member Id column in the StudentThesis table. The large number of
table joins, three in this case, will reduce the efficiency of the
query. To understand how aggregation can help us,
let's take a look at the Cartesian product that you
get when you join the StudentThesis and Professor tables together.
Here's an excerpt that shows the Cartesian product
for one student record:
StudentId Member1 Member2 Member3 Grade Id Name
--------- ------- ------- ------- ----- --- --------------------
1 234 322 456 A 123 Smith, John
1 234 322 456 A 456 Newton, Isaac
1 234 322 456 A 455 Einstein, Albert
1 234 322 456 A 344 Base, Samuel
1 234 322 456 A 322 Anderson, Terry
1 234 322 456 A 234 Baird, Frances
Because you have all possible combinations of all rows, you are
ensured that for any given member Id column from the StudentThesis
table, you will have at least one row with the corresponding
professor's name from the Professors table. The
following CASE expression, which happens to be for the member1
column, takes advantage of that fact to return the appropriate
professor name when the Id in the member1 column matches the Id
column of the Professors table:
CASE WHEN t.Member1=p.Id
THEN p.Name
ELSE NULL END
When there's a match between the two Id values, the
professor name is returned by the CASE expression. Otherwise, when
there's no match, a NULL is returned. The result is
that the Cartesian product now looks like this:
studentid member1 member2 member3 grade id name
---------- --------- --------- --------- ------ ---- -----------------
1 NULL NULL NULL A 123 Smith, John
1 NULL NULL Newto... A 456 Newton, Isaac
1 NULL NULL NULL A 455 Einstein, Albert
1 NULL NULL NULL A 344 Base, Samuel
1 NULL Ander... NULL A 322 Anderson, Terry
1 Baird... NULL NULL A 234 Baird, Frances
This is fine, but we really want one row to be returned for each
student, not six rows as shown here. We can use aggregation to
collapse these six rows into one. The recipe query uses GROUP BY
t.StudentId to accomplish that. The name and id columns from the
professor table aren't needed, so they
aren't included in the select list. The MAX function
is used to extract the one relevant value from each member column.
Each column has only one non-NULL value. Since aggregate functions
ignore NULLs, that one value is the one returned by MAX.
While the query using aggregation is more complex than the simple
query shown earlier, it's more efficient because
only one table-join is needed instead of three. A further advantage
is that the number of joins does not increase as the number of member
columns increases.
|