12.6 The GROUPING_ID and GROUP

12.6 The GROUPING_ID and GROUP_ID Functions

Earlier in this chapter, you saw how to use the GROUPING function to distinguish between the regular GROUP BY rows and the summary rows produced by the GROUP BY extensions. Oracle9i extends the concept of the GROUPING function and introduces two new functions that you can use with a GROUP BY clause:

GROUPING_ID
GROUP_ID

These functions can only be used with a GROUP BY clause. However, unlike the GROUPING function that can only be used with a GROUP BY extension, the GROUPING_ID and GROUP_ID functions can be used in a query, even without a GROUP BY extension.

Although it is legal to use these two functions without a GROUP BY extension, using GROUPING_ID and GROUP_ID without ROLLUP, CUBE, or GROUPING SETS doesn't produce any meaningful output, because GROUPING_ID and GROUP_ID are 0 for all regular GROUP BY rows.

The following sections discuss these two functions in detail.

12.6.1 GROUPING_ID

The syntax of the GROUPING_ID function is as follows:

SELECT ... , GROUPING_ID(ordered_list_of_grouping_columns)
FROM ...
GROUP BY ...

The GROUPING_ID function takes an ordered list of grouping columns as input, and computes the output by working through the following steps:

First, it generates the results of the GROUPING function as applied to each of the individual columns in the list. The result of this step is a set of ones and zeros.
It puts these ones and zeros in the same order as the order of the columns in its argument list to produce a bit vector.
Treating this bit vector (a series of ones and zeros) as a binary number, it converts the bit vector into a decimal (base 10) number.
The decimal number computed in Step 3 is returned as the GROUPING_ID function's output.

The following example illustrates this process and compares the results from GROUPING_ID with those from GROUPING:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES) Total,
GROUPING(O.YEAR) Y, GROUPING(O.MONTH) M, GROUPING(R.NAME) R,
GROUPING_ID (O.YEAR, O.MONTH, R.NAME) GID
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME);

YEAR MONTH     REGION              TOTAL   Y    M   R    GID
---- --------- -------------- ---------- --- ---- --- ------
2000 January   Mid-Atlantic      1221394   0    0   0      0
2000 January   New England       1018430   0    0   0      0
2000 January   SouthEast US       758042   0    0   0      0
2000 January                     2997866   0    0   1      1
2000 February  Mid-Atlantic       857352   0    0   0      0
2000 February  New England       1231492   0    0   0      0
2000 February  SouthEast US      1236846   0    0   0      0
2000 February                    3325690   0    0   1      1
2000 March     Mid-Atlantic      1274062   0    0   0      0
2000 March     New England       1132966   0    0   0      0
2000 March     SouthEast US      1311986   0    0   0      0
2000 March                       3719014   0    0   1      1
2000           Mid-Atlantic      3352808   0    1   0      2
2000           New England       3382888   0    1   0      2
2000           SouthEast US      3306874   0    1   0      2
2000                            10042570   0    1   1      3
2001 January   Mid-Atlantic       610697   0    0   0      0
2001 January   New England        509215   0    0   0      0
2001 January   SouthEast US       379021   0    0   0      0
2001 January                     1498933   0    0   1      1
2001 February  Mid-Atlantic       428676   0    0   0      0
2001 February  New England        615746   0    0   0      0
2001 February  SouthEast US       618423   0    0   0      0
2001 February                    1662845   0    0   1      1
2001 March     Mid-Atlantic       637031   0    0   0      0
2001 March     New England        566483   0    0   0      0
2001 March     SouthEast US       655993   0    0   0      0
2001 March                       1859507   0    0   1      1
2001           Mid-Atlantic      1676404   0    1   0      2
2001           New England       1691444   0    1   0      2
2001           SouthEast US      1653437   0    1   0      2
2001                             5021285   0    1   1      3
     January   Mid-Atlantic      1832091   1    0   0      4
     January   New England       1527645   1    0   0      4
     January   SouthEast US      1137063   1    0   0      4
     January                     4496799   1    0   1      5
     February  Mid-Atlantic      1286028   1    0   0      4
     February  New England       1847238   1    0   0      4
     February  SouthEast US      1855269   1    0   0      4
     February                    4988535   1    0   1      5
     March     Mid-Atlantic      1911093   1    0   0      4
     March     New England       1699449   1    0   0      4
     March     SouthEast US      1967979   1    0   0      4
     March                       5578521   1    0   1      5
               Mid-Atlantic      5029212   1    1   0      6
               New England       5074332   1    1   0      6
               SouthEast US      4960311   1    1   0      6
                                15063855   1    1   1      7

48 rows selected.

Note that the GROUPING_ID is the decimal equivalent of the bit vector generated by the individual GROUPING functions. In this output, the GROUPING_ID has values 0, 1, 2, 3, 4, 5, 6, and 7. Table 12-2 describes these aggregation levels.

Table 12-2. Result of GROUPING_ID(O.YEAR, O.MONTH, R.NAME)

Aggregation level

Bit vector

GROUPING_ID

Regular GROUP BY rows

0 0 0

0

Subtotal for Year-Month, aggregated at (Region)

0 0 1

1

Subtotal for Year-Region, aggregated at (Month)

0 1 0

2

Subtotal for Year, aggregated at (Month, Region)

0 1 1

3

Subtotal for Month-Region, aggregated at (Year)

1 0 0

4

Subtotal for Month, aggregated at (Year, Region)

1 0 1

5

Subtotal for Region, aggregated at (Year, Month)

1 1 0

6

Grand total for all levels, aggregated at (Year, Month, Region)

1 1 1

7

The GROUPING_ID function can be used effectively in a query to filter rows according to your requirement. Let's say you want only the summary rows to be displayed in the output, and not the regular GROUP BY rows. You can use the GROUPING_ID function in the HAVING clause to do this by restricting output to only those rows that contain totals and subtotals (i.e., for which GROUPING_ID > 0):

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES) Total
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME)
HAVING GROUPING_ID (O.YEAR, O.MONTH, R.NAME) > 0;

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January                           2997866
      2000 February                          3325690
      2000 March                             3719014
      2000           Mid-Atlantic            3352808
      2000           New England             3382888
      2000           SouthEast US            3306874
      2000                                  10042570
      2001 January                           1498933
      2001 February                          1662845
      2001 March                             1859507
      2001           Mid-Atlantic            1676404
      2001           New England             1691444
      2001           SouthEast US            1653437
      2001                                   5021285
           January   Mid-Atlantic            1832091
           January   New England             1527645
           January   SouthEast US            1137063
           January                           4496799
           February  Mid-Atlantic            1286028
           February  New England             1847238
           February  SouthEast US            1855269
           February                          4988535
           March     Mid-Atlantic            1911093
           March     New England             1699449
           March     SouthEast US            1967979
           March                             5578521
                     Mid-Atlantic            5029212
                     New England             5074332
                     SouthEast US            4960311
                                            15063855

30 rows selected.

As you can see, GROUPING_ID makes it easier to filter the output of aggregation operations. Without the GROUPING_ID function, you have to write a more complex query using the GROUPING function to achieve the same result. For example, the following query uses GROUPING rather than GROUPING_ID to display only totals and subtotals. Note the added complexity in the HAVING clause.

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES) Total
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME)
HAVING GROUPING(O.YEAR) > 0
OR GROUPING(O.MONTH) > 0
OR GROUPING(R.NAME) > 0;

      YEAR MONTH     REGION                    TOTAL
---------- --------- -------------------- ----------
      2000 January                           2997866
      2000 February                          3325690
      2000 March                             3719014
      2000           Mid-Atlantic            3352808
      2000           New England             3382888
      2000           SouthEast US            3306874
      2000                                  10042570
      2001 January                           1498933
      2001 February                          1662845
      2001 March                             1859507
      2001           Mid-Atlantic            1676404
      2001           New England             1691444
      2001           SouthEast US            1653437
      2001                                   5021285
           January   Mid-Atlantic            1832091
           January   New England             1527645
           January   SouthEast US            1137063
           January                           4496799
           February  Mid-Atlantic            1286028
           February  New England             1847238
           February  SouthEast US            1855269
           February                          4988535
           March     Mid-Atlantic            1911093
           March     New England             1699449
           March     SouthEast US            1967979
           March                             5578521
                     Mid-Atlantic            5029212
                     New England             5074332
                     SouthEast US            4960311
                                            15063855

30 rows selected.

12.6.2 GROUP_ID

As you saw in previous sections, Oracle9i allows you to have repeating grouping columns and multiple grouping operations in a GROUP BY clause. Some combinations could result in duplicate rows in the output. The GROUP_ID distinguishes between otherwise duplicate result rows.

The syntax of the GROUP_ID function is:

SELECT ... , GROUP_ID(  )
FROM ...
GROUP BY ...

The GROUP_ID function takes no argument, and returns 0 through n - 1, where n is the occurrence count for duplicates. The first occurrence of a given row in the output of a query will have a GROUP_ID of 0, the second occurrence of a given row will have a GROUP_ID of 1, and so forth. The following example illustrates the use of the GROUP_ID function:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES) Total, GROUP_ID(  )
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR, ROLLUP (O.YEAR, O.MONTH, R.NAME);

      YEAR MONTH     REGION                    TOTAL GROUP_ID(  )
---------- --------- -------------------- ---------- ----------
      2000 January   Mid-Atlantic            1221394          0
      2000 January   New England             1018430          0
      2000 January   SouthEast US             758042          0
      2000 January                           2997866          0
      2000 February  Mid-Atlantic             857352          0
      2000 February  New England             1231492          0
      2000 February  SouthEast US            1236846          0
      2000 February                          3325690          0
      2000 March     Mid-Atlantic            1274062          0
      2000 March     New England             1132966          0
      2000 March     SouthEast US            1311986          0
      2000 March                             3719014          0
      2001 January   Mid-Atlantic             610697          0
      2001 January   New England              509215          0
      2001 January   SouthEast US             379021          0
      2001 January                           1498933          0
      2001 February  Mid-Atlantic             428676          0
      2001 February  New England              615746          0
      2001 February  SouthEast US             618423          0
      2001 February                          1662845          0
      2001 March     Mid-Atlantic             637031          0
      2001 March     New England              566483          0
      2001 March     SouthEast US             655993          0
      2001 March                             1859507          0
      2000                                  10042570          0
      2001                                   5021285          0
      2000                                  10042570          1
      2001                                   5021285          1

28 rows selected.

Note the value 1 returned by the GROUP_ID function for the last two rows. These rows are indeed duplicates of the previous two rows. If you don't want to see the duplicates in your result set, restrict your query's results to GROUP_ID 0:

SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH,
R.NAME REGION, SUM(O.TOT_SALES) Total, GROUP_ID(  )
FROM ORDERS O, REGION R
WHERE R.REGION_ID = O.REGION_ID
AND O.MONTH BETWEEN 1 AND 3
GROUP BY O.YEAR, ROLLUP (O.YEAR, O.MONTH, R.NAME)
HAVING GROUP_ID(  ) = 0;

      YEAR MONTH     REGION                    TOTAL GROUP_ID(  )
---------- --------- -------------------- ---------- ----------
      2000 January   Mid-Atlantic            1221394          0
      2000 January   New England             1018430          0
      2000 January   SouthEast US             758042          0
      2000 January                           2997866          0
      2000 February  Mid-Atlantic             857352          0
      2000 February  New England             1231492          0
      2000 February  SouthEast US            1236846          0
      2000 February                          3325690          0
      2000 March     Mid-Atlantic            1274062          0
      2000 March     New England             1132966          0
      2000 March     SouthEast US            1311986          0
      2000 March                             3719014          0
      2001 January   Mid-Atlantic             610697          0
      2001 January   New England              509215          0
      2001 January   SouthEast US             379021          0
      2001 January                           1498933          0
      2001 February  Mid-Atlantic             428676          0
      2001 February  New England              615746          0
      2001 February  SouthEast US             618423          0
      2001 February                          1662845          0
      2001 March     Mid-Atlantic             637031          0
      2001 March     New England              566483          0
      2001 March     SouthEast US             655993          0
      2001 March                             1859507          0
      2000                                  10042570          0
      2001                                   5021285          0

26 rows selected.

This version of the query uses HAVING GROUP_ID = 0 to eliminate the two duplicate totals from the result set. GROUP_ID is only meaningful in the HAVING clause, because it applies to summarized data. You can't use GROUP_ID in a WHERE clause, and it wouldn't make sense to try.