12.1 ROLLUPIn Chapter 4, you saw how the GROUP BY clause, along with the aggregate functions, can be used to produce summary results. For example, if you want to print the monthly total sales for each region, you would probably execute the following query: SELECT R.NAME REGION, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID GROUP BY R.NAME, O.MONTH; REGION MONTH SUM(O.TOT_SALES) -------------------- --------- ---------------- Mid-Atlantic January 610697 Mid-Atlantic February 428676 Mid-Atlantic March 637031 Mid-Atlantic April 541146 Mid-Atlantic May 592935 Mid-Atlantic June 501485 Mid-Atlantic July 606914 Mid-Atlantic August 460520 Mid-Atlantic September 392898 Mid-Atlantic October 510117 Mid-Atlantic November 532889 Mid-Atlantic December 492458 New England January 509215 New England February 615746 New England March 566483 New England April 597622 New England May 566285 New England June 503354 New England July 559334 New England August 547656 New England September 575589 New England October 549648 New England November 461395 New England December 533314 SouthEast US January 379021 SouthEast US February 618423 SouthEast US March 655993 SouthEast US April 610017 SouthEast US May 661094 SouthEast US June 568572 SouthEast US July 556992 SouthEast US August 478765 SouthEast US September 635211 SouthEast US October 536841 SouthEast US November 553866 SouthEast US December 613700 36 rows selected. As expected, this report prints the total sales for each region and month combination. However, in a more complex application, you may also want to have the subtotal for each region over all months, along with the total for all regions, or you may want the subtotal for each month over all regions, along with the total for all months. In short, you may need to generate subtotals and totals at more than one level. 12.1.1 Using UNION (The Old Way)In data warehouse applications, you frequently need to generate summary information over various dimensions, and subtotal and total across those dimensions. Generating and retrieving this type of summary information is a core goal of almost all data warehouse applications. By this time, you have realized that a simple GROUP BY query is not sufficient to generate the subtotals and totals described in this section. In order to illustrate the complexity of the problem, let's attempt to write a query that would return the following in a single output:
One way to generate multiple levels of summary (the only way prior to Oracle8i) is to write a UNION query. For example, the following UNION query will give us the desired three levels of subtotals: SELECT R.NAME REGION, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID GROUP BY R.NAME, O.MONTH UNION ALL SELECT R.NAME REGION, NULL, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID GROUP BY R.NAME UNION ALL SELECT NULL, NULL, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID; REGION MONTH SUM(O.TOT_SALES) -------------------- --------- ---------------- Mid-Atlantic January 610697 Mid-Atlantic February 428676 Mid-Atlantic March 637031 Mid-Atlantic April 541146 Mid-Atlantic May 592935 Mid-Atlantic June 501485 Mid-Atlantic July 606914 Mid-Atlantic August 460520 Mid-Atlantic September 392898 Mid-Atlantic October 510117 Mid-Atlantic November 532889 Mid-Atlantic December 492458 New England January 509215 New England February 615746 New England March 566483 New England April 597622 New England May 566285 New England June 503354 New England July 559334 New England August 547656 New England September 575589 New England October 549648 New England November 461395 New England December 533314 SouthEast US January 379021 SouthEast US February 618423 SouthEast US March 655993 SouthEast US April 610017 SouthEast US May 661094 SouthEast US June 568572 SouthEast US July 556992 SouthEast US August 478765 SouthEast US September 635211 SouthEast US October 536841 SouthEast US November 553866 SouthEast US December 613700 Mid-Atlantic 6307766 New England 6585641 SouthEast US 6868495 19761902 40 rows selected. This query produced 40 rows of output, 36 of which are the sales for each month for every region. The last 4 rows are the subtotals and the total. The three rows with region names and NULL values for the month are the subtotals for each region over all the months, and the last row with NULL values for both the region and month is the total sales for all the regions over all the months. Now that you have the desired result, try to analyze the query a bit. You have a very small orders table with only 720 rows in this example. You wanted to have summary information over just two dimensions—region and month. You have 3 regions and 12 months. To get the desired summary information from this table, you have to write a query consisting of 3 SELECT statements combined together using UNION ALL. The EXPLAIN PLAN on this query is: Query Plan --------------------------------------- SELECT STATEMENT Cost = 15 UNION-ALL SORT GROUP BY HASH JOIN TABLE ACCESS FULL REGION TABLE ACCESS FULL ORDERS SORT GROUP BY HASH JOIN TABLE ACCESS FULL REGION TABLE ACCESS FULL ORDERS SORT AGGREGATE NESTED LOOPS TABLE ACCESS FULL ORDERS INDEX UNIQUE SCAN PK7 14 rows selected. As indicated by the EXPLAIN PLAN output, Oracle needs to perform the following operations to get the results:
In any practical application the orders table will consist of hundreds of thousands of rows, and performing all these operations would be time-consuming. Even worse, if you have more dimensions for which to prepare summary information than the two shown in this example, you have to write an even more complex query. The bottom line is that such a query badly hurts performance. 12.1.2 Using ROLLUP (The New Way)Oracle8i introduced several new features for generating multiple levels of summary information with one query. One such feature is a set of extensions to the GROUP BY clause. In Oracle8i, the GROUP BY clause comes with two extensions: ROLLUP and CUBE. Oracle9i introduces another extension: GROUPING SETS. We discuss ROLLUP in this section. CUBE and GROUPING SETS are discussed later in this chapter. ROLLUP is an extension to the GROUP BY clause, and therefore can only appear in a query with a GROUP BY clause. The ROLLUP operation groups the selected rows based on the expressions in the GROUP BY clause, and prepares a summary row for each group. The syntax of ROLLUP is: SELECT ... FROM ... GROUP BY ROLLUP (ordered list of grouping columns) Using ROLLUP, you can generate the summary information discussed at the beginning of this section in a much easier way than in our UNION ALL query. For example: SELECT R.NAME REGION, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID GROUP BY ROLLUP (R.NAME, O.MONTH); REGION MONTH SUM(O.TOT_SALES) -------------------- --------- ---------------- Mid-Atlantic January 610697 Mid-Atlantic February 428676 Mid-Atlantic March 637031 Mid-Atlantic April 541146 Mid-Atlantic May 592935 Mid-Atlantic June 501485 Mid-Atlantic July 606914 Mid-Atlantic August 460520 Mid-Atlantic September 392898 Mid-Atlantic October 510117 Mid-Atlantic November 532889 Mid-Atlantic December 492458 Mid-Atlantic 6307766 New England January 509215 New England February 615746 New England March 566483 New England April 597622 New England May 566285 New England June 503354 New England July 559334 New England August 547656 New England September 575589 New England October 549648 New England November 461395 New England December 533314 New England 6585641 SouthEast US January 379021 SouthEast US February 618423 SouthEast US March 655993 SouthEast US April 610017 SouthEast US May 661094 SouthEast US June 568572 SouthEast US July 556992 SouthEast US August 478765 SouthEast US September 635211 SouthEast US October 536841 SouthEast US November 553866 SouthEast US December 613700 SouthEast US 6868495 19761902 40 rows selected. As you can see in this output, the ROLLUP operation produced subtotals across the specified dimensions and a grand total. The argument to the ROLLUP operation is an ordered list of grouping columns. Since the ROLLUP operation is used in conjunction with the GROUP BY clause, it first generates aggregate values based on the GROUP BY operation on the ordered list of columns. Then it generates higher level subtotals and finally a grand total. ROLLUP not only simplifies the query, it results in more efficient execution. The explain plan for this ROLLUP query is as follows: Query Plan -------------------------------------- SELECT STATEMENT Cost = 7 SORT GROUP BY ROLLUP HASH JOIN TABLE ACCESS FULL REGION TABLE ACCESS FULL ORDERS Rather than the multiple table scans, joins, and other operations required by the UNION version of the query, the ROLLUP query needs just one full table scan on REGION, one full table scan on ORDERS, and one join to generate the required output. If you want to generate subtotals for each month instead of for each region, all you need to do is change the order of columns in the ROLLUP operation, as in the following example: SELECT R.NAME REGION, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID GROUP BY ROLLUP (O.MONTH, R.NAME); REGION MONTH SUM(O.TOT_SALES) -------------------- --------- ---------------- Mid-Atlantic January 610697 New England January 509215 SouthEast US January 379021 January 1498933 Mid-Atlantic February 428676 New England February 615746 SouthEast US February 618423 February 1662845 Mid-Atlantic March 637031 New England March 566483 SouthEast US March 655993 March 1859507 Mid-Atlantic April 541146 New England April 597622 SouthEast US April 610017 April 1748785 Mid-Atlantic May 592935 New England May 566285 SouthEast US May 661094 May 1820314 Mid-Atlantic June 501485 New England June 503354 SouthEast US June 568572 June 1573411 Mid-Atlantic July 606914 New England July 559334 SouthEast US July 556992 July 1723240 Mid-Atlantic August 460520 New England August 547656 SouthEast US August 478765 August 1486941 Mid-Atlantic September 392898 New England September 575589 SouthEast US September 635211 September 1603698 Mid-Atlantic October 510117 New England October 549648 SouthEast US October 536841 October 1596606 Mid-Atlantic November 532889 New England November 461395 SouthEast US November 553866 November 1548150 Mid-Atlantic December 492458 New England December 533314 SouthEast US December 613700 December 1639472 19761902 49 rows selected. Adding dimensions does not result in additional complexity. The following query rolls up subtotals for the region, the month, and the year for the first quarter: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME); YEAR MONTH REGION SUM(O.TOT_SALES) ---------- --------- -------------------- ---------------- 2000 January Mid-Atlantic 1221394 2000 January New England 1018430 2000 January SouthEast US 758042 2000 January 2997866 2000 February Mid-Atlantic 857352 2000 February New England 1231492 2000 February SouthEast US 1236846 2000 February 3325690 2000 March Mid-Atlantic 1274062 2000 March New England 1132966 2000 March SouthEast US 1311986 2000 March 3719014 2000 10042570 2001 January Mid-Atlantic 610697 2001 January New England 509215 2001 January SouthEast US 379021 2001 January 1498933 2001 February Mid-Atlantic 428676 2001 February New England 615746 2001 February SouthEast US 618423 2001 February 1662845 2001 March Mid-Atlantic 637031 2001 March New England 566483 2001 March SouthEast US 655993 2001 March 1859507 2001 5021285 15063855 27 rows selected. 12.1.3 Generating Partial ROLLUPsIn a ROLLUP query with N dimensions, the grand total is considered the top level. The various subtotal rows of N-1 dimensions constitute the next lower level, the subtotal rows of (N-2) dimensions constitute yet another level down, and so on. In the most recent example, you have three dimensions (year, month, and region), and the total row is the top level. The subtotal rows for the year represent the next lower level, because these rows are subtotals across two dimensions (month and region). The subtotal rows for the year and month combination are one level lower, because these rows are subtotals across one dimension (region). The rest of the rows are the result of the regular GROUP BY operation (without ROLLUP), and form the lowest level. If you want to exclude some subtotals and totals from the ROLLUP output, you can only move top to bottom, i.e., exclude the top-level total first, then progressively go down to the next level subtotals, and so on. To do this, you have to take out one or more columns from the ROLLUP operation, and put them in the GROUP BY clause. This is called a partial ROLLUP. As an example of a partial ROLLUP, let's see what happens when you take out the first column, which is O.YEAR, from the previous ROLLUP operation and move it into the GROUP BY clause. SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY O.YEAR ROLLUP (O.MONTH, R.NAME); YEAR MONTH REGION SUM(O.TOT_SALES) ---------- --------- -------------------- ---------------- 2000 January Mid-Atlantic 1221394 2000 January New England 1018430 2000 January SouthEast US 758042 2000 January 2997866 2000 February Mid-Atlantic 857352 2000 February New England 1231492 2000 February SouthEast US 1236846 2000 February 3325690 2000 March Mid-Atlantic 1274062 2000 March New England 1132966 2000 March SouthEast US 1311986 2000 March 3719014 2000 10042570 2001 January Mid-Atlantic 610697 2001 January New England 509215 2001 January SouthEast US 379021 2001 January 1498933 2001 February Mid-Atlantic 428676 2001 February New England 615746 2001 February SouthEast US 618423 2001 February 1662845 2001 March Mid-Atlantic 637031 2001 March New England 566483 2001 March SouthEast US 655993 2001 March 1859507 2001 5021285 26 rows selected. The query in this example excludes the grand-total row from the output. By taking out O.YEAR from the ROLLUP operation, you are asking the database not to roll up summary information over the years. Therefore, the database rolls up summary information on region and month. When you proceed to remove O.MONTH from the ROLLUP operation, the query will not generate the roll up summary for the month dimension, and only the region-level subtotals will be printed in the output. For example: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY O.YEAR, O.MONTH ROLLUP (R.NAME); YEAR MONTH REGION SUM(O.TOT_SALES) ---------- --------- -------------------- ---------------- 2000 January Mid-Atlantic 1221394 2000 January New England 1018430 2000 January SouthEast US 758042 2000 January 2997866 2000 February Mid-Atlantic 857352 2000 February New England 1231492 2000 February SouthEast US 1236846 2000 February 3325690 2000 March Mid-Atlantic 1274062 2000 March New England 1132966 2000 March SouthEast US 1311986 2000 March 3719014 2001 January Mid-Atlantic 610697 2001 January New England 509215 2001 January SouthEast US 379021 2001 January 1498933 2001 February Mid-Atlantic 428676 2001 February New England 615746 2001 February SouthEast US 618423 2001 February 1662845 2001 March Mid-Atlantic 637031 2001 March New England 566483 2001 March SouthEast US 655993 2001 March 1859507 24 rows selected. |