Team LiB   Previous Section   Next Section

7.8 Pivoting Tables

7.8.1 Problem

You want to produce a report in grid format. The rows in the table first have to be grouped, and then the values of interest from each group must be printed horizontally across one row of the report. A pivot report is the opposite of data folding. You could think of it as unfolding the data.

Taking the BalanceSheet2 table from the introduction to this chapter, you want create a human-readable report of balance sheets. The report should look like this:

CompanyId   Year        Assets         Liabilities    Equity         
----------- ----------- -------------- -------------- -------------- 
12445       2000        36759.00       22252.00       14230.00
12445       2001        37472.00       22924.00       14548.00
12445       2002        38973.00       24476.00       14497.00

You can readily see that the problem here is to unfold the normalized data and present it in a format that contains three score values in each row.

7.8.2 Solution

The solution is a three-step process. First, identify some key items such as the column that you want to use as a row label. Second, write a query based on the key items that you identified in Step 1. Finally, execute that query.

7.8.2.1 Step 1: Identifying key items

To generate a pivot report, you need to identify the following key items:

  • The column(s) containing the value to use as a row label. In our example, this will be CompanyId and Year columns, because we want each business year to be listed on a separate row.

  • The column labels. In our example, we are interested in three courses, so the course codes Assets, Liabilities, and Equity will be our column labels.

  • A database column to correlate values to columns in the final report. In our case, we will use the CategoryId column.

  • The values to report. In our example, we are reporting the year's results in a simplified balance sheet structure. Category results come from the Dollars column.

Once you've identified these values, you can plug them into the template shown next.

7.8.2.2 Step 2: Writing the query

To write the query to produce a pivot report, start with the following template, and plug in the key items that you identified in the previous section.

SELECT rowlabel_column,
   MAX(CASE WHEN columnlabel_column=value1 
            THEN reportedvalue END) AS columnlabel1,
   MAX(CASE WHEN columnlabel_column=value2  
            THEN reportedvalue END) AS columnlabel2,
   MAX(CASE WHEN columnlabel_column=value3 
            THEN reportedvalue END) AS columnlabel3,
   . . .
FROM table
GROUP BY rowlabel_column
rowlabel_column

The column containing the row label.

columnlabel_column

The column that drives the value of the column label.

value, value2, value3 . . .

Values corresponding to column labels. Each column in the final report corresponds to a different value.

reportedvalue

The value to report.

columnlabel1, columnlable2, columnlabel3 . . .

The values to use as column labels. In our example, these correspond to the values in the columnlabel_column.

table

The table containing the data on which the report is based.

When the key items in our example are applied to the template, the following code is produced:

SELECT CompanyId, Year,
   MAX(CASE WHEN CategoryId='A' 
            THEN Dollars END) AS Assets,
   MAX(CASE WHEN CategoryId='L' 
            THEN Dollars END) AS Liabilities,
   MAX(CASE WHEN CategoryId='E' 
            THEN Dollars END) AS Equity
FROM BalanceSheets2
GROUP BY CompanyId, Year

In our example, the column headings are exactly the same as the values in the database column that is used to sort values into various columns on the report. Values for the category 'A', for example, are reported under a column titled Assets.

7.8.2.3 Step 3: Run the query

The third and final step is to execute the query to generate the report. The results from executing the query shown in Step 2 are as follows:

CompanyId   Year        Assets         Liabilities    Equity         
----------- ----------- -------------- -------------- -------------- 
12445       2000        36759.00       22252.00       14230.00
12445       2001        37472.00       22924.00       14548.00
12445       2002        38973.00       24476.00       14497.00

As you can see, these results are in the desired format: one row per business year and three categories per row.

7.8.3 Discussion

Pivoting a table is a powerful reporting technique that is widely used in SQL programming. It is fast and efficient, and only one pass through the table is needed to generate the final result. You do need to use a set of pivot values to differentiate the data in one column from that in another. Once the proper pivot values are known, you can use them as the basis for positioning the value that you want to report on in the appropriate column in the final report.

When SQL-92 introduced the CASE statement, the pivoting techniques shown here became much more popular than before. Prior to having the CASE statement available, programmers had to use embedded characteristic functions, which were often vendor-specific. The expressions used to position values in the correct column on the report also frequently included calls to functions such as SIGN and ABS. Those implementations were very unreadable, but they did the same thing — they implemented conditional logic in SQL queries.

The code shown in this recipe steps through each row in the table. The CASE statements for each result column filter out all rows that do not correspond to the particular pivot values of interest. The CASE statements also position category values in their correct columns. The intermediate result table is the same length as BalanceSheets2, but with more columns — one for each pivot value. In any given row, only one of the pivot value columns is NOT NULL.

To create the final report, the resulting rows are grouped as a result of using the GROUP BY clause. All the rows for each business year are combined into one. Since we know that there is only one value per business year in each pivot column, any aggregate function may be used to return that value. In this recipe, the MAX function was used.

If it were possible for one business year to have two or more values for the same course, additional logic would need to be added to the query to decide which of those values should be reported.

The role of the aggregate function can be extended beyond that shown in this example. Say that you are reporting quarterly results rather than yearly scores and that you want to report a business year's result as the sum of all quarters for that year. Your BalanceSheets2 table in this case might look like this:

CREATE TABLE QuarterlyBalanceSheets (
   CompanyId INTEGER,
   CategoryId CHAR(1),
   Year INTEGER,
   Quarter INTEGER,
   Dollars DECIMAL(12,2)
)

The following data shows how the table stores quarterly results of the company:

CompanyId   CategoryId Year        Quarter     Dollars        
----------- ---------- ----------- ----------- -------------- 
12445       A          2002        1           1579.00
12445       L          2002        1           345.00
12445       E          2002        1           1579.00
12445       A          2002        2           2666.00
12445       L          2002        2           324.00
12445       E          2002        2           2342.00
12445       A          2002        3           1909.00
12445       L          2002        3           453.00
12445       E          2002        3           1456.00
12445       A          2002        4           6245.00
12445       L          2002        4           567.00
12445       E          2002        4           5678.00

Because there are now multiple scores for each company/year combination, it is no longer valid to use a function such as MAX to return just one score. We can, however, use other aggregate functions such as AVG or SUM. For example:

SELECT CompanyId, Year,
   SUM(CASE WHEN CategoryId='A' 
            THEN Dollars END) AS Assets,
   SUM(CASE WHEN CategoryId='L' 
            THEN Dollars END) AS Liabilities,
   SUM(CASE WHEN CategoryId='E' 
            THEN Dollars END) AS Equity
FROM BalanceSheets2
GROUP BY CompanyId, Year

In this case, the result reported for each year will be the sum of all quarter results for that year:

CompanyId   Year  Assets    Liabilities  Equity                                   
----------- ----- --------- ------------ --------- 
12445       2000  36759.00  22252.00     14230.00
12445       2001  37472.00  22924.00     14548.00
12445       2002  38973.00  24476.00     14497.00
    Team LiB   Previous Section   Next Section