7.8 Pivoting Tables
7.8.1 Problem
You want to produce a report in grid format. The rows in the table
first have to be grouped, and then the values of interest from each
group must be printed horizontally across one row of the report. A
pivot report is the opposite of data folding. You could think of it
as unfolding the data.
Taking the BalanceSheet2 table from the introduction to this chapter,
you want create a human-readable report of balance sheets. The report
should look like this:
CompanyId Year Assets Liabilities Equity
----------- ----------- -------------- -------------- --------------
12445 2000 36759.00 22252.00 14230.00
12445 2001 37472.00 22924.00 14548.00
12445 2002 38973.00 24476.00 14497.00
You can readily see that the problem here is to unfold the normalized
data and present it in a format that contains three score values in
each row.
7.8.2 Solution
The solution is a three-step process. First, identify some key items
such as the column that you want to use as a row label. Second, write
a query based on the key items that you identified in Step 1.
Finally, execute that query.
7.8.2.1 Step 1: Identifying key items
To generate a pivot report, you need to identify the following key
items:
The column(s) containing the value to use as a row label. In our
example, this will be CompanyId and Year columns, because we want
each business year to be listed on a separate row.
The column labels. In our example, we are interested in three
courses, so the course codes Assets, Liabilities, and Equity will be
our column labels.
A database column to correlate values to columns in the final report.
In our case, we will use the CategoryId column.
The values to report. In our example, we are reporting the
year's results in a simplified balance sheet
structure. Category results come from the Dollars column.
Once you've identified these values, you can plug
them into the template shown next.
7.8.2.2 Step 2: Writing the query
To write the query to produce a pivot
report, start with the following
template, and plug in the key items that you identified in the
previous section.
SELECT rowlabel_column,
MAX(CASE WHEN columnlabel_column=value1
THEN reportedvalue END) AS columnlabel1,
MAX(CASE WHEN columnlabel_column=value2
THEN reportedvalue END) AS columnlabel2,
MAX(CASE WHEN columnlabel_column=value3
THEN reportedvalue END) AS columnlabel3,
. . .
FROM table
GROUP BY rowlabel_column
- rowlabel_column
-
The column containing the row label.
- columnlabel_column
-
The column that drives the value of the column label.
- value, value2, value3 . . .
-
Values corresponding to column labels. Each column in the final
report corresponds to a different value.
- reportedvalue
-
The value to report.
- columnlabel1, columnlable2, columnlabel3 . . .
-
The values to use as column labels. In our example, these correspond
to the values in the columnlabel_column.
- table
-
The table containing the data on which the report is based.
When the key items in our example are applied to the template, the
following code is produced:
SELECT CompanyId, Year,
MAX(CASE WHEN CategoryId='A'
THEN Dollars END) AS Assets,
MAX(CASE WHEN CategoryId='L'
THEN Dollars END) AS Liabilities,
MAX(CASE WHEN CategoryId='E'
THEN Dollars END) AS Equity
FROM BalanceSheets2
GROUP BY CompanyId, Year
In our example, the column headings are exactly the same as the
values in the database column that is used to sort values into
various columns on the report. Values for the category
'A', for example, are reported under a column
titled Assets.
7.8.2.3 Step 3: Run the query
The third and final step is to execute the query to generate the
report. The results from executing the query shown in Step 2 are as
follows:
CompanyId Year Assets Liabilities Equity
----------- ----------- -------------- -------------- --------------
12445 2000 36759.00 22252.00 14230.00
12445 2001 37472.00 22924.00 14548.00
12445 2002 38973.00 24476.00 14497.00
As you can see, these results are in the desired format: one row per
business year and three categories per row.
7.8.3 Discussion
Pivoting a table is a powerful reporting technique that is widely
used in SQL programming. It is fast and efficient, and only one pass
through the table is needed to generate the final result. You do need
to use a set of pivot values to differentiate the data in one column
from that in another. Once the proper pivot values are known, you can
use them as the basis for positioning the value that you want to
report on in the appropriate column in the final report.
When SQL-92 introduced the CASE statement, the pivoting techniques
shown here became much more popular than before. Prior to having the
CASE statement available, programmers had to use embedded
characteristic functions, which were often vendor-specific. The
expressions used to position values in the correct column on the
report also frequently included calls to functions such as SIGN and
ABS. Those implementations were very unreadable, but they did the
same thing — they implemented conditional logic in SQL queries.
The code shown in this recipe steps through each row in the table.
The CASE statements for each result column filter out all rows that
do not correspond to the particular pivot values of interest. The
CASE statements also position category values in their correct
columns. The intermediate result table is the same length as
BalanceSheets2, but with more columns — one for each pivot value.
In any given row, only one of the pivot value columns is NOT NULL.
To create the final report, the resulting rows are grouped as a
result of using the GROUP BY clause. All the rows for each business
year are combined into one. Since we know that there is only one
value per business year in each pivot column, any aggregate function
may be used to return that value. In this recipe, the MAX function
was used.
|
If it were possible for one business year to have two or more values
for the same course, additional logic would need to be added to the
query to decide which of those values should be reported.
|
|
The role of the aggregate function can be extended beyond that shown
in this example. Say that you are reporting quarterly results rather
than yearly scores and that you want to report a business
year's result as the sum of all quarters for that
year. Your BalanceSheets2 table in this case might look like this:
CREATE TABLE QuarterlyBalanceSheets (
CompanyId INTEGER,
CategoryId CHAR(1),
Year INTEGER,
Quarter INTEGER,
Dollars DECIMAL(12,2)
)
The following data shows how the table stores quarterly results of
the company:
CompanyId CategoryId Year Quarter Dollars
----------- ---------- ----------- ----------- --------------
12445 A 2002 1 1579.00
12445 L 2002 1 345.00
12445 E 2002 1 1579.00
12445 A 2002 2 2666.00
12445 L 2002 2 324.00
12445 E 2002 2 2342.00
12445 A 2002 3 1909.00
12445 L 2002 3 453.00
12445 E 2002 3 1456.00
12445 A 2002 4 6245.00
12445 L 2002 4 567.00
12445 E 2002 4 5678.00
Because there are now multiple scores for each company/year
combination, it is no longer valid to use a function such as MAX to
return just one score. We can, however, use other aggregate functions
such as AVG or SUM. For example:
SELECT CompanyId, Year,
SUM(CASE WHEN CategoryId='A'
THEN Dollars END) AS Assets,
SUM(CASE WHEN CategoryId='L'
THEN Dollars END) AS Liabilities,
SUM(CASE WHEN CategoryId='E'
THEN Dollars END) AS Equity
FROM BalanceSheets2
GROUP BY CompanyId, Year
In this case, the result reported for each year will be the
sum of
all quarter results for that year:
CompanyId Year Assets Liabilities Equity
----------- ----- --------- ------------ ---------
12445 2000 36759.00 22252.00 14230.00
12445 2001 37472.00 22924.00 14548.00
12445 2002 38973.00 24476.00 14497.00
|