12.6 The GROUPING_ID and GROUP_ID FunctionsEarlier in this chapter, you saw how to use the GROUPING function to distinguish between the regular GROUP BY rows and the summary rows produced by the GROUP BY extensions. Oracle9i extends the concept of the GROUPING function and introduces two new functions that you can use with a GROUP BY clause:
These functions can only be used with a GROUP BY clause. However, unlike the GROUPING function that can only be used with a GROUP BY extension, the GROUPING_ID and GROUP_ID functions can be used in a query, even without a GROUP BY extension.
The following sections discuss these two functions in detail. 12.6.1 GROUPING_IDThe syntax of the GROUPING_ID function is as follows: SELECT ... , GROUPING_ID(ordered_list_of_grouping_columns) FROM ... GROUP BY ... The GROUPING_ID function takes an ordered list of grouping columns as input, and computes the output by working through the following steps:
The following example illustrates this process and compares the results from GROUPING_ID with those from GROUPING: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) Total, GROUPING(O.YEAR) Y, GROUPING(O.MONTH) M, GROUPING(R.NAME) R, GROUPING_ID (O.YEAR, O.MONTH, R.NAME) GID FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME); YEAR MONTH REGION TOTAL Y M R GID ---- --------- -------------- ---------- --- ---- --- ------ 2000 January Mid-Atlantic 1221394 0 0 0 0 2000 January New England 1018430 0 0 0 0 2000 January SouthEast US 758042 0 0 0 0 2000 January 2997866 0 0 1 1 2000 February Mid-Atlantic 857352 0 0 0 0 2000 February New England 1231492 0 0 0 0 2000 February SouthEast US 1236846 0 0 0 0 2000 February 3325690 0 0 1 1 2000 March Mid-Atlantic 1274062 0 0 0 0 2000 March New England 1132966 0 0 0 0 2000 March SouthEast US 1311986 0 0 0 0 2000 March 3719014 0 0 1 1 2000 Mid-Atlantic 3352808 0 1 0 2 2000 New England 3382888 0 1 0 2 2000 SouthEast US 3306874 0 1 0 2 2000 10042570 0 1 1 3 2001 January Mid-Atlantic 610697 0 0 0 0 2001 January New England 509215 0 0 0 0 2001 January SouthEast US 379021 0 0 0 0 2001 January 1498933 0 0 1 1 2001 February Mid-Atlantic 428676 0 0 0 0 2001 February New England 615746 0 0 0 0 2001 February SouthEast US 618423 0 0 0 0 2001 February 1662845 0 0 1 1 2001 March Mid-Atlantic 637031 0 0 0 0 2001 March New England 566483 0 0 0 0 2001 March SouthEast US 655993 0 0 0 0 2001 March 1859507 0 0 1 1 2001 Mid-Atlantic 1676404 0 1 0 2 2001 New England 1691444 0 1 0 2 2001 SouthEast US 1653437 0 1 0 2 2001 5021285 0 1 1 3 January Mid-Atlantic 1832091 1 0 0 4 January New England 1527645 1 0 0 4 January SouthEast US 1137063 1 0 0 4 January 4496799 1 0 1 5 February Mid-Atlantic 1286028 1 0 0 4 February New England 1847238 1 0 0 4 February SouthEast US 1855269 1 0 0 4 February 4988535 1 0 1 5 March Mid-Atlantic 1911093 1 0 0 4 March New England 1699449 1 0 0 4 March SouthEast US 1967979 1 0 0 4 March 5578521 1 0 1 5 Mid-Atlantic 5029212 1 1 0 6 New England 5074332 1 1 0 6 SouthEast US 4960311 1 1 0 6 15063855 1 1 1 7 48 rows selected. Note that the GROUPING_ID is the decimal equivalent of the bit vector generated by the individual GROUPING functions. In this output, the GROUPING_ID has values 0, 1, 2, 3, 4, 5, 6, and 7. Table 12-2 describes these aggregation levels.
The GROUPING_ID function can be used effectively in a query to filter rows according to your requirement. Let's say you want only the summary rows to be displayed in the output, and not the regular GROUP BY rows. You can use the GROUPING_ID function in the HAVING clause to do this by restricting output to only those rows that contain totals and subtotals (i.e., for which GROUPING_ID > 0): SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) Total FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME) HAVING GROUPING_ID (O.YEAR, O.MONTH, R.NAME) > 0; YEAR MONTH REGION TOTAL ---------- --------- -------------------- ---------- 2000 January 2997866 2000 February 3325690 2000 March 3719014 2000 Mid-Atlantic 3352808 2000 New England 3382888 2000 SouthEast US 3306874 2000 10042570 2001 January 1498933 2001 February 1662845 2001 March 1859507 2001 Mid-Atlantic 1676404 2001 New England 1691444 2001 SouthEast US 1653437 2001 5021285 January Mid-Atlantic 1832091 January New England 1527645 January SouthEast US 1137063 January 4496799 February Mid-Atlantic 1286028 February New England 1847238 February SouthEast US 1855269 February 4988535 March Mid-Atlantic 1911093 March New England 1699449 March SouthEast US 1967979 March 5578521 Mid-Atlantic 5029212 New England 5074332 SouthEast US 4960311 15063855 30 rows selected. As you can see, GROUPING_ID makes it easier to filter the output of aggregation operations. Without the GROUPING_ID function, you have to write a more complex query using the GROUPING function to achieve the same result. For example, the following query uses GROUPING rather than GROUPING_ID to display only totals and subtotals. Note the added complexity in the HAVING clause. SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) Total FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY CUBE (O.YEAR, O.MONTH, R.NAME) HAVING GROUPING(O.YEAR) > 0 OR GROUPING(O.MONTH) > 0 OR GROUPING(R.NAME) > 0; YEAR MONTH REGION TOTAL ---------- --------- -------------------- ---------- 2000 January 2997866 2000 February 3325690 2000 March 3719014 2000 Mid-Atlantic 3352808 2000 New England 3382888 2000 SouthEast US 3306874 2000 10042570 2001 January 1498933 2001 February 1662845 2001 March 1859507 2001 Mid-Atlantic 1676404 2001 New England 1691444 2001 SouthEast US 1653437 2001 5021285 January Mid-Atlantic 1832091 January New England 1527645 January SouthEast US 1137063 January 4496799 February Mid-Atlantic 1286028 February New England 1847238 February SouthEast US 1855269 February 4988535 March Mid-Atlantic 1911093 March New England 1699449 March SouthEast US 1967979 March 5578521 Mid-Atlantic 5029212 New England 5074332 SouthEast US 4960311 15063855 30 rows selected. 12.6.2 GROUP_IDAs you saw in previous sections, Oracle9i allows you to have repeating grouping columns and multiple grouping operations in a GROUP BY clause. Some combinations could result in duplicate rows in the output. The GROUP_ID distinguishes between otherwise duplicate result rows. The syntax of the GROUP_ID function is: SELECT ... , GROUP_ID( ) FROM ... GROUP BY ... The GROUP_ID function takes no argument, and returns 0 through n - 1, where n is the occurrence count for duplicates. The first occurrence of a given row in the output of a query will have a GROUP_ID of 0, the second occurrence of a given row will have a GROUP_ID of 1, and so forth. The following example illustrates the use of the GROUP_ID function: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) Total, GROUP_ID( ) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY O.YEAR, ROLLUP (O.YEAR, O.MONTH, R.NAME); YEAR MONTH REGION TOTAL GROUP_ID( ) ---------- --------- -------------------- ---------- ---------- 2000 January Mid-Atlantic 1221394 0 2000 January New England 1018430 0 2000 January SouthEast US 758042 0 2000 January 2997866 0 2000 February Mid-Atlantic 857352 0 2000 February New England 1231492 0 2000 February SouthEast US 1236846 0 2000 February 3325690 0 2000 March Mid-Atlantic 1274062 0 2000 March New England 1132966 0 2000 March SouthEast US 1311986 0 2000 March 3719014 0 2001 January Mid-Atlantic 610697 0 2001 January New England 509215 0 2001 January SouthEast US 379021 0 2001 January 1498933 0 2001 February Mid-Atlantic 428676 0 2001 February New England 615746 0 2001 February SouthEast US 618423 0 2001 February 1662845 0 2001 March Mid-Atlantic 637031 0 2001 March New England 566483 0 2001 March SouthEast US 655993 0 2001 March 1859507 0 2000 10042570 0 2001 5021285 0 2000 10042570 1 2001 5021285 1 28 rows selected. Note the value 1 returned by the GROUP_ID function for the last two rows. These rows are indeed duplicates of the previous two rows. If you don't want to see the duplicates in your result set, restrict your query's results to GROUP_ID 0: SELECT O.YEAR, TO_CHAR(TO_DATE(O.MONTH, 'MM'), 'Month') MONTH, R.NAME REGION, SUM(O.TOT_SALES) Total, GROUP_ID( ) FROM ORDERS O, REGION R WHERE R.REGION_ID = O.REGION_ID AND O.MONTH BETWEEN 1 AND 3 GROUP BY O.YEAR, ROLLUP (O.YEAR, O.MONTH, R.NAME) HAVING GROUP_ID( ) = 0; YEAR MONTH REGION TOTAL GROUP_ID( ) ---------- --------- -------------------- ---------- ---------- 2000 January Mid-Atlantic 1221394 0 2000 January New England 1018430 0 2000 January SouthEast US 758042 0 2000 January 2997866 0 2000 February Mid-Atlantic 857352 0 2000 February New England 1231492 0 2000 February SouthEast US 1236846 0 2000 February 3325690 0 2000 March Mid-Atlantic 1274062 0 2000 March New England 1132966 0 2000 March SouthEast US 1311986 0 2000 March 3719014 0 2001 January Mid-Atlantic 610697 0 2001 January New England 509215 0 2001 January SouthEast US 379021 0 2001 January 1498933 0 2001 February Mid-Atlantic 428676 0 2001 February New England 615746 0 2001 February SouthEast US 618423 0 2001 February 1662845 0 2001 March Mid-Atlantic 637031 0 2001 March New England 566483 0 2001 March SouthEast US 655993 0 2001 March 1859507 0 2000 10042570 0 2001 5021285 0 26 rows selected. This version of the query uses HAVING GROUP_ID = 0 to eliminate the two duplicate totals from the result set. GROUP_ID is only meaningful in the HAVING clause, because it applies to summarized data. You can't use GROUP_ID in a WHERE clause, and it wouldn't make sense to try. |