Team LiB   Previous Section   Next Section

12.1 ROLLUP

In Chapter 4, you saw how the GROUP BY clause, along with the aggregate functions, can be used to produce summary results. For example, if you want to print the monthly total sales for each region, you would probably execute the following query:

SELECT R.NAME REGION, 
       TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME, O.MONTH;

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic         January             610697
Mid-Atlantic         February            428676
Mid-Atlantic         March               637031
Mid-Atlantic         April               541146
Mid-Atlantic         May                 592935
Mid-Atlantic         June                501485
Mid-Atlantic         July                606914
Mid-Atlantic         August              460520
Mid-Atlantic         September           392898
Mid-Atlantic         October             510117
Mid-Atlantic         November            532889
Mid-Atlantic         December            492458
New England          January             509215
New England          February            615746
New England          March               566483
New England          April               597622
New England          May                 566285
New England          June                503354
New England          July                559334
New England          August              547656
New England          September           575589
New England          October             549648
New England          November            461395
New England          December            533314
SouthEast US         January             379021
SouthEast US         February            618423
SouthEast US         March               655993
SouthEast US         April               610017
SouthEast US         May                 661094
SouthEast US         June                568572
SouthEast US         July                556992
SouthEast US         August              478765
SouthEast US         September           635211
SouthEast US         October             536841
SouthEast US         November            553866
SouthEast US         December            613700

36 rows selected.

As expected, this report prints the total sales for each region and month combination. However, in a more complex application, you may also want to have the subtotal for each region over all months, along with the total for all regions, or you may want the subtotal for each month over all regions, along with the total for all months. In short, you may need to generate subtotals and totals at more than one level.

12.1.1 Using UNION (The Old Way)

In data warehouse applications, you frequently need to generate summary information over various dimensions, and subtotal and total across those dimensions. Generating and retrieving this type of summary information is a core goal of almost all data warehouse applications.

By this time, you have realized that a simple GROUP BY query is not sufficient to generate the subtotals and totals described in this section. In order to illustrate the complexity of the problem, let's attempt to write a query that would return the following in a single output:

  • Sales for each month for every region

  • Subtotals over all months for every region

  • Total sales for all regions over all months

One way to generate multiple levels of summary (the only way prior to Oracle8i) is to write a UNION query. For example, the following UNION query will give us the desired three levels of subtotals:

SELECT R.NAME REGION, 
       TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME, O.MONTH
UNION ALL
SELECT R.NAME REGION, NULL, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY R.NAME
UNION ALL
SELECT NULL, NULL, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID;

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic         January             610697
Mid-Atlantic         February            428676
Mid-Atlantic         March               637031
Mid-Atlantic         April               541146
Mid-Atlantic         May                 592935
Mid-Atlantic         June                501485
Mid-Atlantic         July                606914
Mid-Atlantic         August              460520
Mid-Atlantic         September           392898
Mid-Atlantic         October             510117
Mid-Atlantic         November            532889
Mid-Atlantic         December            492458
New England          January             509215
New England          February            615746
New England          March               566483
New England          April               597622
New England          May                 566285
New England          June                503354
New England          July                559334
New England          August              547656
New England          September           575589
New England          October             549648
New England          November            461395
New England          December            533314
SouthEast US         January             379021
SouthEast US         February            618423
SouthEast US         March               655993
SouthEast US         April               610017
SouthEast US         May                 661094
SouthEast US         June                568572
SouthEast US         July                556992
SouthEast US         August              478765
SouthEast US         September           635211
SouthEast US         October             536841
SouthEast US         November            553866
SouthEast US         December            613700
Mid-Atlantic                            6307766
New England                             6585641
SouthEast US                            6868495
                                       19761902

40 rows selected.

This query produced 40 rows of output, 36 of which are the sales for each month for every region. The last 4 rows are the subtotals and the total. The three rows with region names and NULL values for the month are the subtotals for each region over all the months, and the last row with NULL values for both the region and month is the total sales for all the regions over all the months.

Now that you have the desired result, try to analyze the query a bit. You have a very small orders table with only 720 rows in this example. You wanted to have summary information over just two dimensions—region and month. You have 3 regions and 12 months. To get the desired summary information from this table, you have to write a query consisting of 3 SELECT statements combined together using UNION ALL. The EXPLAIN PLAN on this query is:

Query Plan
---------------------------------------
SELECT STATEMENT   Cost = 15
  UNION-ALL
    SORT GROUP BY
      HASH JOIN
        TABLE ACCESS FULL REGION
        TABLE ACCESS FULL ORDERS
    SORT GROUP BY
      HASH JOIN
        TABLE ACCESS FULL REGION
        TABLE ACCESS FULL ORDERS
    SORT AGGREGATE
      NESTED LOOPS
        TABLE ACCESS FULL ORDERS
        INDEX UNIQUE SCAN PK7

14 rows selected.

As indicated by the EXPLAIN PLAN output, Oracle needs to perform the following operations to get the results:

  • Three FULL TABLE scans on ORDERS

  • Two FULL TABLE scans on REGION

  • One INDEX scan on PK7 (Primary key of table REGION)

  • Two HASH JOINs

  • One NESTED LOOP JOIN

  • Two SORT GROUP BY operations

  • One SORT AGGREGATE operation

  • One UNION ALL

In any practical application the orders table will consist of hundreds of thousands of rows, and performing all these operations would be time-consuming. Even worse, if you have more dimensions for which to prepare summary information than the two shown in this example, you have to write an even more complex query. The bottom line is that such a query badly hurts performance.

12.1.2 Using ROLLUP (The New Way)

Oracle8i introduced several new features for generating multiple levels of summary information with one query. One such feature is a set of extensions to the GROUP BY clause. In Oracle8i, the GROUP BY clause comes with two extensions: ROLLUP and CUBE. Oracle9i introduces another extension: GROUPING SETS. We discuss ROLLUP in this section. CUBE and GROUPING SETS are discussed later in this chapter.

ROLLUP is an extension to the GROUP BY clause, and therefore can only appear in a query with a GROUP BY clause. The ROLLUP operation groups the selected rows based on the expressions in the GROUP BY clause, and prepares a summary row for each group. The syntax of ROLLUP is:

SELECT ...
FROM ...
GROUP BY ROLLUP (ordered list of grouping columns)

Using ROLLUP, you can generate the summary information discussed at the beginning of this section in a much easier way than in our UNION ALL query. For example:

SELECT R.NAME REGION, 
       TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY ROLLUP (R.NAME, O.MONTH);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic         January             610697
Mid-Atlantic         February            428676
Mid-Atlantic         March               637031
Mid-Atlantic         April               541146
Mid-Atlantic         May                 592935
Mid-Atlantic         June                501485
Mid-Atlantic         July                606914
Mid-Atlantic         August              460520
Mid-Atlantic         September           392898
Mid-Atlantic         October             510117
Mid-Atlantic         November            532889
Mid-Atlantic         December            492458
Mid-Atlantic                            6307766
New England          January             509215
New England          February            615746
New England          March               566483
New England          April               597622
New England          May                 566285
New England          June                503354
New England          July                559334
New England          August              547656
New England          September           575589
New England          October             549648
New England          November            461395
New England          December            533314
New England                             6585641
SouthEast US         January             379021
SouthEast US         February            618423
SouthEast US         March               655993
SouthEast US         April               610017
SouthEast US         May                 661094
SouthEast US         June                568572
SouthEast US         July                556992
SouthEast US         August              478765
SouthEast US         September           635211
SouthEast US         October             536841
SouthEast US         November            553866
SouthEast US         December            613700
SouthEast US                            6868495
                                       19761902

40 rows selected.

As you can see in this output, the ROLLUP operation produced subtotals across the specified dimensions and a grand total. The argument to the ROLLUP operation is an ordered list of grouping columns. Since the ROLLUP operation is used in conjunction with the GROUP BY clause, it first generates aggregate values based on the GROUP BY operation on the ordered list of columns. Then it generates higher level subtotals and finally a grand total. ROLLUP not only simplifies the query, it results in more efficient execution. The explain plan for this ROLLUP query is as follows:

Query Plan
--------------------------------------
SELECT STATEMENT   Cost = 7
  SORT GROUP BY ROLLUP
    HASH JOIN
      TABLE ACCESS FULL REGION
      TABLE ACCESS FULL ORDERS

Rather than the multiple table scans, joins, and other operations required by the UNION version of the query, the ROLLUP query needs just one full table scan on REGION, one full table scan on ORDERS, and one join to generate the required output.

If you want to generate subtotals for each month instead of for each region, all you need to do is change the order of columns in the ROLLUP operation, as in the following example:

SELECT R.NAME REGION, 
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
GROUP BY ROLLUP (O.MONTH, R.NAME);

REGION               MONTH     SUM(O.TOT_SALES)
-------------------- --------- ----------------
Mid-Atlantic         January             610697
New England          January             509215
SouthEast US         January             379021
                     January            1498933
Mid-Atlantic         February            428676
New England          February            615746
SouthEast US         February            618423
                     February           1662845
Mid-Atlantic         March               637031
New England          March               566483
SouthEast US         March               655993
                     March              1859507
Mid-Atlantic         April               541146
New England          April               597622
SouthEast US         April               610017
                     April              1748785
Mid-Atlantic         May                 592935
New England          May                 566285
SouthEast US         May                 661094
                     May                1820314
Mid-Atlantic         June                501485
New England          June                503354
SouthEast US         June                568572
                     June               1573411
Mid-Atlantic         July                606914
New England          July                559334
SouthEast US         July                556992
                     July               1723240
Mid-Atlantic         August              460520
New England          August              547656
SouthEast US         August              478765
                     August             1486941
Mid-Atlantic         September           392898
New England          September           575589
SouthEast US         September           635211
                     September          1603698
Mid-Atlantic         October             510117
New England          October             549648
SouthEast US         October             536841
                     October            1596606
Mid-Atlantic         November            532889
New England          November            461395
SouthEast US         November            553866
                     November           1548150
Mid-Atlantic         December            492458
New England          December            533314
SouthEast US         December            613700
                     December           1639472
                                       19761902

49 rows selected.

Adding dimensions does not result in additional complexity. The following query rolls up subtotals for the region, the month, and the year for the first quarter:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   Mid-Atlantic                  1221394
      2000 January   New England                   1018430
      2000 January   SouthEast US                   758042
      2000 January                                 2997866
      2000 February  Mid-Atlantic                   857352
      2000 February  New England                   1231492
      2000 February  SouthEast US                  1236846
      2000 February                                3325690
      2000 March     Mid-Atlantic                  1274062
      2000 March     New England                   1132966
      2000 March     SouthEast US                  1311986
      2000 March                                   3719014
      2000                                        10042570
      2001 January   Mid-Atlantic                   610697
      2001 January   New England                    509215
      2001 January   SouthEast US                   379021
      2001 January                                 1498933
      2001 February  Mid-Atlantic                   428676
      2001 February  New England                    615746
      2001 February  SouthEast US                   618423
      2001 February                                1662845
      2001 March     Mid-Atlantic                   637031
      2001 March     New England                    566483
      2001 March     SouthEast US                   655993
      2001 March                                   1859507
      2001                                         5021285
                                                  15063855

27 rows selected.

12.1.3 Generating Partial ROLLUPs

In a ROLLUP query with N dimensions, the grand total is considered the top level. The various subtotal rows of N-1 dimensions constitute the next lower level, the subtotal rows of (N-2) dimensions constitute yet another level down, and so on. In the most recent example, you have three dimensions (year, month, and region), and the total row is the top level. The subtotal rows for the year represent the next lower level, because these rows are subtotals across two dimensions (month and region). The subtotal rows for the year and month combination are one level lower, because these rows are subtotals across one dimension (region). The rest of the rows are the result of the regular GROUP BY operation (without ROLLUP), and form the lowest level.

If you want to exclude some subtotals and totals from the ROLLUP output, you can only move top to bottom, i.e., exclude the top-level total first, then progressively go down to the next level subtotals, and so on. To do this, you have to take out one or more columns from the ROLLUP operation, and put them in the GROUP BY clause. This is called a partial ROLLUP.

As an example of a partial ROLLUP, let's see what happens when you take out the first column, which is O.YEAR, from the previous ROLLUP operation and move it into the GROUP BY clause.

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR ROLLUP (O.MONTH, R.NAME);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   Mid-Atlantic                  1221394
      2000 January   New England                   1018430
      2000 January   SouthEast US                   758042
      2000 January                                 2997866
      2000 February  Mid-Atlantic                   857352
      2000 February  New England                   1231492
      2000 February  SouthEast US                  1236846
      2000 February                                3325690
      2000 March     Mid-Atlantic                  1274062
      2000 March     New England                   1132966
      2000 March     SouthEast US                  1311986
      2000 March                                   3719014
      2000                                        10042570
      2001 January   Mid-Atlantic                   610697
      2001 January   New England                    509215
      2001 January   SouthEast US                   379021
      2001 January                                 1498933
      2001 February  Mid-Atlantic                   428676
      2001 February  New England                    615746
      2001 February  SouthEast US                   618423
      2001 February                                1662845
      2001 March     Mid-Atlantic                   637031
      2001 March     New England                    566483
      2001 March     SouthEast US                   655993
      2001 March                                   1859507
      2001                                         5021285

26 rows selected.

The query in this example excludes the grand-total row from the output. By taking out O.YEAR from the ROLLUP operation, you are asking the database not to roll up summary information over the years. Therefore, the database rolls up summary information on region and month. When you proceed to remove O.MONTH from the ROLLUP operation, the query will not generate the roll up summary for the month dimension, and only the region-level subtotals will be printed in the output. For example:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR, O.MONTH ROLLUP (R.NAME);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)
---------- --------- -------------------- ----------------
      2000 January   Mid-Atlantic                  1221394
      2000 January   New England                   1018430
      2000 January   SouthEast US                   758042
      2000 January                                 2997866
      2000 February  Mid-Atlantic                   857352
      2000 February  New England                   1231492
      2000 February  SouthEast US                  1236846
      2000 February                                3325690
      2000 March     Mid-Atlantic                  1274062
      2000 March     New England                   1132966
      2000 March     SouthEast US                  1311986
      2000 March                                   3719014
      2001 January   Mid-Atlantic                   610697
      2001 January   New England                    509215
      2001 January   SouthEast US                   379021
      2001 January                                 1498933
      2001 February  Mid-Atlantic                   428676
      2001 February  New England                    615746
      2001 February  SouthEast US                   618423
      2001 February                                1662845
      2001 March     Mid-Atlantic                   637031
      2001 March     New England                    566483
      2001 March     SouthEast US                   655993
      2001 March                                   1859507

24 rows selected.
    Team LiB   Previous Section   Next Section