Team LiB   Previous Section   Next Section

7.10 Joining Arrays with Master Tables

7.10.1 Problem

You want to join a non-normalized array-like data table with a normalized master record. The results should still be non-normalized, but instead of showing coded values from the detail table, you want them to show the corresponding values from the master table.

Taking both the StudentThesis table and the Professors table, you want to join them to replace the member codes with the corresponding professor names. You want to do it so that the structure of the result corresponds to the structure of the StudentThesis table — you want the resulting rows to have one student ID, three professor names, and one grade.

7.10.2 Solution

The following SELECT statement reproduces the structure from the non-normalized table (StudentThesis), but replaces the professors ID numbers with their corresponding professor names. Each line of the report will list a student ID together with the names of the three dissertation committee members for that student.

SELECT t.StudentId,
   MAX(CASE WHEN t.Member1=p.id 
            THEN p.Name 
            ELSE NULL END) Member1,
   MAX(CASE WHEN t.Member2=p.id 
            THEN p.Name 
            ELSE NULL END) Member2,
   MAX(CASE WHEN t.Member3=p.id 
            THEN p.Name 
            ELSE NULL END) Member3 
FROM StudentThesis t, Professors p
GROUP BY t.StudentId

The results from executing this query will look like these:

StudentId  Member1         Member2          Member3
---------- --------------- ---------------- ----------------
1          Baird, Frances  Anderson, Terry  Newton, Isaac
2          Newton, Isaac   Anderson, Terry  Smith, John
3          Newton, Isaac   Einstein, Albert Base, Samuel
4          Anderson, Terry Smith, John      Einstein, Albert

7.10.3 Discussion

The query shown in this recipe is driven by the StudentThesis table, and it uses aggregation combined with conditional logic to link each member Id number with its corresponding master record in the Professors table. You may well wonder about the need for aggregation. To explain that, let's start by looking at the following, simpler query:

SELECT t.StudentId, p1.Name, p2.Name, P3.Name, t.Grade
FROM 
   StudentThesis t JOIN Professors p1 ON t.Member1 = p1.Id
   JOIN Professors p2 ON t.Member2 = p2.Id
   JOIN Professors p3 ON t.Member3 = p3.Id

This query also returns professor names in place of their Id numbers, but it does so by joining one instance of the Professors table for each member Id column in the StudentThesis table. The large number of table joins, three in this case, will reduce the efficiency of the query. To understand how aggregation can help us, let's take a look at the Cartesian product that you get when you join the StudentThesis and Professor tables together. Here's an excerpt that shows the Cartesian product for one student record:

StudentId Member1 Member2 Member3 Grade Id  Name                 
--------- ------- ------- ------- ----- --- -------------------- 
1         234     322     456     A     123 Smith, John         
1         234     322     456     A     456 Newton, Isaac       
1         234     322     456     A     455 Einstein, Albert    
1         234     322     456     A     344 Base, Samuel        
1         234     322     456     A     322 Anderson, Terry     
1         234     322     456     A     234 Baird, Frances

Because you have all possible combinations of all rows, you are ensured that for any given member Id column from the StudentThesis table, you will have at least one row with the corresponding professor's name from the Professors table. The following CASE expression, which happens to be for the member1 column, takes advantage of that fact to return the appropriate professor name when the Id in the member1 column matches the Id column of the Professors table:

CASE WHEN t.Member1=p.Id 
     THEN p.Name 
     ELSE NULL END

When there's a match between the two Id values, the professor name is returned by the CASE expression. Otherwise, when there's no match, a NULL is returned. The result is that the Cartesian product now looks like this:

studentid  member1   member2   member3   grade  id   name             
---------- --------- --------- --------- ------ ---- -----------------
1          NULL      NULL      NULL      A      123  Smith, John      
1          NULL      NULL      Newto...  A      456  Newton, Isaac    
1          NULL      NULL      NULL      A      455  Einstein, Albert 
1          NULL      NULL      NULL      A      344  Base, Samuel     
1          NULL      Ander...  NULL      A      322  Anderson, Terry  
1          Baird...  NULL      NULL      A      234  Baird, Frances   

This is fine, but we really want one row to be returned for each student, not six rows as shown here. We can use aggregation to collapse these six rows into one. The recipe query uses GROUP BY t.StudentId to accomplish that. The name and id columns from the professor table aren't needed, so they aren't included in the select list. The MAX function is used to extract the one relevant value from each member column. Each column has only one non-NULL value. Since aggregate functions ignore NULLs, that one value is the one returned by MAX.

While the query using aggregation is more complex than the simple query shown earlier, it's more efficient because only one table-join is needed instead of three. A further advantage is that the number of joins does not increase as the number of member columns increases.

    Team LiB   Previous Section   Next Section