Team LiB   Previous Section   Next Section

12.3 The GROUPING Function

ROLLUP and CUBE produce extra rows in the output that contain subtotals and totals. These rows contain NULL values for one or more columns. An output containing NULLs and indicating subtotals doesn't make sense to an ordinary person who is unware of the behavior of ROLLUP and CUBE operations. Does your VP care about whether you used ROLLUP or CUBE or any other operation to get him the monthly total sales for each region? Obviously, he doesn't. That's exactly why you are reading this page and not your VP.

If you know your way around the NVL function, you would probably attempt to translate each NULL value from CUBE and ROLLUP to some descriptive value, as in the following example:

SELECT NVL(TO_CHAR(O.YEAR), 'All Years') YEAR,
NVL(TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month'), 'First Quarter') MONTH,
NVL(R.NAME, 'All Regions') REGION, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME);

YEAR             MONTH         REGION               SUM(O.TOT_SALES)
---------------- ------------- -------------------- ----------------
2000             January       Mid-Atlantic                  1221394
2000             January       New England                   1018430
2000             January       SouthEast US                   758042
2000             January       All Regions                   2997866
2000             February      Mid-Atlantic                   857352
2000             February      New England                   1231492
2000             February      SouthEast US                  1236846
2000             February      All Regions                   3325690
2000             March         Mid-Atlantic                  1274062
2000             March         New England                   1132966
2000             March         SouthEast US                  1311986
2000             March         All Regions                   3719014
2000             First Quarter All Regions                  10042570
2001             January       Mid-Atlantic                   610697
2001             January       New England                    509215
2001             January       SouthEast US                   379021
2001             January       All Regions                   1498933
2001             February      Mid-Atlantic                   428676
2001             February      New England                    615746
2001             February      SouthEast US                   618423
2001             February      All Regions                   1662845
2001             March         Mid-Atlantic                   637031
2001             March         New England                    566483
2001             March         SouthEast US                   655993
2001             March         All Regions                   1859507
2001             First Quarter All Regions                   5021285
All Years        First Quarter All Regions                  15063855

27 rows selected.

The NVL function works pretty well for this example. However, if the data itself contains some NULL values, it becomes impossible to distinguish whether a NULL value represents unavailable data or a subtotal row. The NVL function will cause a problem in such a case. The following data can be used to illustrate this problem:

SELECT * FROM CUST_ORDER;

 ORDER_NBR CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT  EXPECTED_ STATUS
---------- -------- ------------ ---------- --------- --------- ----------
      1001      231         7354         99 22-JUL-01 23-JUL-01 DELIVERED
      1000      201         7354            19-JUL-01 24-JUL-01
      1002      255         7368            12-JUL-01 25-JUL-01
      1003      264         7368         56 16-JUL-01 26-JUL-01 DELIVERED
      1004      244         7368         34 18-JUL-01 27-JUL-01 PENDING
      1005      288         7368         99 22-JUL-01 24-JUL-01 DELIVERED
      1006      231         7354            22-JUL-01 28-JUL-01
      1007      255         7368         25 20-JUL-01 22-JUL-01 PENDING
      1008      255         7368         25 21-JUL-01 23-JUL-01 PENDING
      1009      231         7354         56 18-JUL-01 22-JUL-01 DELIVERED
      1012      231         7354         99 22-JUL-01 23-JUL-01 DELIVERED
      1011      201         7354            19-JUL-01 24-JUL-01
      1015      255         7368            12-JUL-01 25-JUL-01
      1017      264         7368         56 16-JUL-01 26-JUL-01 DELIVERED
      1019      244         7368         34 18-JUL-01 27-JUL-01 PENDING
      1021      288         7368         99 22-JUL-01 24-JUL-01 DELIVERED
      1023      231         7354            22-JUL-01 28-JUL-01
      1025      255         7368         25 20-JUL-01 22-JUL-01 PENDING
      1027      255         7368         25 21-JUL-01 23-JUL-01 PENDING
      1029      231         7354         56 18-JUL-01 22-JUL-01 DELIVERED

20 rows selected.

Note that the column STATUS contains NULL values. If you want the summary status of orders for each customer, and you executed the following query (note the application of NVL to the STATUS column), the output might surprise you.

SELECT NVL(TO_CHAR(CUST_NBR), 'All Customers') CUSTOMER, 
NVL(STATUS, 'All Status') STATUS,
COUNT(*) FROM CUST_ORDER
GROUP BY CUBE(CUST_NBR, STATUS);

CUSTOMER            STATUS                 COUNT(*)
------------------- -------------------- ----------
201                 All Status                    2
201                 All Status                    2
231                 DELIVERED                     4
231                 All Status                    2
231                 All Status                    6
244                 PENDING                       2
244                 All Status                    2
255                 PENDING                       4
255                 All Status                    2
255                 All Status                    6
264                 DELIVERED                     2
264                 All Status                    2
288                 DELIVERED                     2
288                 All Status                    2
All Customers       DELIVERED                     8
All Customers       PENDING                       6
All Customers       All Status                    6
All Customers       All Status                   20

18 rows selected.

This output doesn't make any sense. You stand a good chance of losing your job if you send this output to your VP. The problem is that any time the STATUS column legitimately contains a NULL value, the NVL function returns the string "All Status". Obviously, NVL isn't useful in this situation. However, don't worry—Oracle8i provides a solution to this problem through the GROUPING function.

The GROUPING function is used only in conjunction with either a ROLLUP or a CUBE operation. The GROUPING function takes a grouping column name as input and returns either 1 or 0. A 1 is returned if the value is NULL as the result of aggregation (ROLLUP or CUBE); otherwise, 0 is returned. The general syntax of the GROUPING function is:

SELECT ... [GROUPING(grouping_column_name)] ...
FROM ...
GROUP BY ... {ROLLUP | CUBE} (grouping_column_name)

The following example illustrates the use of GROUPING function in a simple way by returning the GROUPING function results for the three columns passed to ROLLUP:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES),
GROUPING(O.YEAR) Y, GROUPING(O.MONTH) M, GROUPING(R.NAME) R
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME);

      YEAR MONTH     REGION               SUM(O.TOT_SALES)    Y    M    R
---------- --------- -------------------- ---------------- ---- ---- ----
      2000 January   Mid-Atlantic                  1221394    0    0    0
      2000 January   New England                   1018430    0    0    0
      2000 January   SouthEast US                   758042    0    0    0
      2000 January                                 2997866    0    0    1
      2000 February  Mid-Atlantic                   857352    0    0    0
      2000 February  New England                   1231492    0    0    0
      2000 February  SouthEast US                  1236846    0    0    0
      2000 February                                3325690    0    0    1
      2000 March     Mid-Atlantic                  1274062    0    0    0
      2000 March     New England                   1132966    0    0    0
      2000 March     SouthEast US                  1311986    0    0    0
      2000 March                                   3719014    0    0    1
      2000                                        10042570    0    1    1
      2001 January   Mid-Atlantic                   610697    0    0    0
      2001 January   New England                    509215    0    0    0
      2001 January   SouthEast US                   379021    0    0    0
      2001 January                                 1498933    0    0    1
      2001 February  Mid-Atlantic                   428676    0    0    0
      2001 February  New England                    615746    0    0    0
      2001 February  SouthEast US                   618423    0    0    0
      2001 February                                1662845    0    0    1
      2001 March     Mid-Atlantic                   637031    0    0    0
      2001 March     New England                    566483    0    0    0
      2001 March     SouthEast US                   655993    0    0    0
      2001 March                                   1859507    0    0    1
      2001                                         5021285    0    1    1
                                                  15063855    1    1    1

27 rows selected.

Look at the Y, M, and R columns in this output. Row 4 is a region-level subtotal for a particular month and year, and therefore, the GROUPING function results in a value of 1 for the region and a value 0 for the month and year. Row 26 (the second to last) is a subtotal for all regions and months for a particular year, and therefore, the GROUPING function prints 1 for the month and the region and 0 for the year. Row 27 (the grand total) contains 1 for all the GROUPING columns.

With a combination of GROUPING and DECODE, you can produce more readable query output when using CUBE and ROLLUP, as in the following example:

SELECT DECODE(GROUPING(O.YEAR), 1, 'All Years', O.YEAR) Year,
DECODE(GROUPING(O.MONTH), 1, 'All Months', 
TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month')) Month,
DECODE(GROUPING(R.NAME), 1, 'All Regions', R.NAME) Region, SUM(O.TOT_SALES)
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY ROLLUP (O.YEAR, O.MONTH, R.NAME);

YEAR         MONTH      REGION               SUM(O.TOT_SALES)
------------ ---------- -------------------- ----------------
2000         January    Mid-Atlantic                  1221394
2000         January    New England                   1018430
2000         January    SouthEast US                   758042
2000         January    All Regions                   2997866
2000         February   Mid-Atlantic                   857352
2000         February   New England                   1231492
2000         February   SouthEast US                  1236846
2000         February   All Regions                   3325690
2000         March      Mid-Atlantic                  1274062
2000         March      New England                   1132966
2000         March      SouthEast US                  1311986
2000         March      All Regions                   3719014
2000         All Months All Regions                  10042570
2001         January    Mid-Atlantic                   610697
2001         January    New England                    509215
2001         January    SouthEast US                   379021
2001         January    All Regions                   1498933
2001         February   Mid-Atlantic                   428676
2001         February   New England                    615746
2001         February   SouthEast US                   618423
2001         February   All Regions                   1662845
2001         March      Mid-Atlantic                   637031
2001         March      New England                    566483
2001         March      SouthEast US                   655993
2001         March      All Regions                   1859507
2001         All Months All Regions                   5021285
All Years    All Months All Regions                  15063855

27 rows selected.

By using DECODE with GROUPING, you produced the same result that was produced by using NVL at the beginning of the section. However, the risk of mistreating a NULL data value as a summary row is eliminated by using GROUPING and DECODE. You will notice this in the following example, in which NULL data values in subtotal and total rows are treated differently by the GROUPING function than the NULL values in the summary rows.

SELECT DECODE(GROUPING(CUST_NBR), 1, 'All Customers', CUST_NBR) CUSTOMER,
DECODE(GROUPING(STATUS), 1, 'All Status', STATUS) STATUS, COUNT(*)
FROM CUST_ORDER
GROUP BY CUBE(CUST_NBR, STATUS);

CUSTOMER                                 STATUS                 COUNT(*)
---------------------------------------- -------------------- ----------
201                                                                    2
201                                      All Status                    2
231                                      DELIVERED                     4
231                                                                    2
231                                      All Status                    6
244                                      PENDING                       2
244                                      All Status                    2
255                                      PENDING                       4
255                                                                    2
255                                      All Status                    6
264                                      DELIVERED                     2
264                                      All Status                    2
288                                      DELIVERED                     2
288                                      All Status                    2
All Customers                            DELIVERED                     8
All Customers                            PENDING                       6
All Customers                                                          6
All Customers                            All Status                   20

18 rows selected. 

Oracle9i introduced two new functions that are related to GROUPING: GROUPING_ID and GROUP_ID, discussed later in Section 12.5. They are worth knowing about if you are using Oracle9i.

    Team LiB   Previous Section   Next Section