4.1 Aggregate FunctionsIn essence, an aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows: aggregate_function([DISTINCT | ALL] expression) The syntax elements are:
Let's look at a simple example. The following SQL uses the MAX function to find the maximum salary of all employees: SELECT MAX(SALARY) FROM EMPLOYEE; MAX(SALARY) ----------- 5000 In subsequent sections, we use a series of slightly more involved examples that illustrate various aspects of aggregate function behavior. For those examples, we use the following CUST_ORDER table: DESC CUST_ORDER Name Null? Type -------------------------------- -------- -------------- ORDER_NBR NOT NULL NUMBER(7) CUST_NBR NOT NULL NUMBER(5) SALES_EMP_ID NOT NULL NUMBER(5) SALE_PRICE NUMBER(9,2) ORDER_DT NOT NULL DATE EXPECTED_SHIP_DT NOT NULL DATE CANCELLED_DT DATE SHIP_DT DATE STATUS VARCHAR2(20) SELECT ORDER_NBR, CUST_NBR, SALES_EMP_ID, SALE_PRICE, ORDER_DT, EXPECTED_SHIP_DT FROM CUST_ORDER; ORDER_NBR CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT EXPECTED_ ---------- -------- ------------ ---------- --------- --------- 1001 231 7354 99 22-JUL-01 23-JUL-01 1000 201 7354 19-JUL-01 24-JUL-01 1002 255 7368 12-JUL-01 25-JUL-01 1003 264 7368 56 16-JUL-01 26-JUL-01 1004 244 7368 34 18-JUL-01 27-JUL-01 1005 288 7368 99 22-JUL-01 24-JUL-01 1006 231 7354 22-JUL-01 28-JUL-01 1007 255 7368 25 20-JUL-01 22-JUL-01 1008 255 7368 25 21-JUL-01 23-JUL-01 1009 231 7354 56 18-JUL-01 22-JUL-01 1012 231 7354 99 22-JUL-01 23-JUL-01 1011 201 7354 19-JUL-01 24-JUL-01 1015 255 7368 12-JUL-01 25-JUL-01 1017 264 7368 56 16-JUL-01 26-JUL-01 1019 244 7368 34 18-JUL-01 27-JUL-01 1021 288 7368 99 22-JUL-01 24-JUL-01 1023 231 7354 22-JUL-01 28-JUL-01 1025 255 7368 25 20-JUL-01 22-JUL-01 1027 255 7368 25 21-JUL-01 23-JUL-01 1029 231 7354 56 18-JUL-01 22-JUL-01 20 rows selected. 4.1.1 NULLs and Aggregate FunctionsNotice that the column SALE_PRICE in the CUST_ORDER table is nullable, and that it contains NULL values for some rows. To examine the effect of NULLs in an aggregate function, we execute the following SQL: SELECT COUNT(*), COUNT(SALE_PRICE) FROM CUST_ORDER; COUNT(*) COUNT(SALE_PRICE) -------- ----------------- 20 14 Notice the difference in the output of COUNT(*) and COUNT(SALE_PRICE). This is because COUNT(SALE_PRICE) ignores NULLs, whereas COUNT(*) doesn't. The reason COUNT(*) doesn't ignore NULLs is because it counts rows, not COLUMN values. The concept of NULL doesn't apply to a row as a whole. Other than COUNT(*), there is only one other aggregate function that doesn't ignore NULLs, and that is GROUPING. All other aggregate functions ignore NULLs. We will discuss GROUPING in Chapter 12. For now, let's examine the effect of NULLs when they are ignored. SUM, MAX, MIN, AVG, etc. all ignore NULLs. Therefore, if we are trying to find a value such as the average sale price in the CUST_ORDER table, the average will be of the 14 rows that have a value for that column. The following example shows the count of all rows, the total of all sale prices, and the average of all sale prices: SELECT COUNT(*), SUM(SALE_PRICE), AVG(SALE_PRICE) FROM CUST_ORDER; COUNT(*) SUM(SALE_PRICE) AVG(SALE_PRICE) --------------- --------------- --------------- 20 788 56.2857143 Note that AVG(SALE_PRICE) is not equal to SUM(SALE_PRICE) / COUNT(*). If it were, the result of AVG(SALE_PRICE) would have been 788 / 20 = 39.4. But, since the AVG function ignores NULLS, it divides the total sale price by 14, and not by 20 (788 / 14 = 56.2857143). There may be situations where we want an average to be taken over all the rows in a table, not just the rows with non-NULL values for the column in question. In these situations we have to use the NVL function within the AVG function call to assign 0 (or some other useful value) to the column in place of any NULL values. (DECODE or the new COALESCE function can be used in place of NVL. See Chapter 9 for details.) Here's an example: SELECT AVG(NVL(SALE_PRICE,0)) FROM CUST_ORDER; AVG(NVL(SALE_PRICE,0)) ---------------------- 39.4 Notice that the use of NVL causes all 20 rows to be considered for average computation, and the rows with NULL values for SALE_PRICE are assumed to have a 0 value for that column. 4.1.2 Use of DISTINCT and ALLMost aggregate functions allow the use of DISTINCT or ALL along with the expression argument. DISTINCT allows us to disregard duplicate expression values, while ALL causes duplicate expression values to be included in the result. Notice that the column CUST_NBR has duplicate values. Observe the result of the following SQL: SELECT COUNT(CUST_NBR), COUNT(DISTINCT CUST_NBR), COUNT(ALL CUST_NBR) FROM CUST_ORDER; COUNT(CUST_NBR) COUNT(DISTINCTCUST_NBR) COUNT(ALLCUST_NBR) --------------- ----------------------- ------------------ 20 6 20 There are six distinct values in the CUST_NBR column. Therefore, COUNT(DISTINCT CUST_NBR) returns 6, whereas COUNT(CUST_NBR) and COUNT(ALL CUST_NBR) both return 20. ALL is the default, which means that if we don't specify either DISTINCT or ALL before the expression argument in an aggregate function, the function will consider all the rows that have a non-NULL value for the expression. An important thing to note here is that ALL doesn't cause an aggregate function to consider NULL values. For example, COUNT(ALL SALE_PRICE) in the following example still returns 14, and not 20. SELECT COUNT(ALL SALE_PRICE) FROM CUST_ORDER; COUNT(ALLSALE_PRICE) -------------------- 14 Since ALL is the default, we can explicitly use ALL with every aggregate function. However, the aggregate functions that take more than one argument as input don't allow the use of DISTINCT. These include CORR, COVAR_POP, COVAR_SAMP, and all the linear regression functions. In addition, some functions that take only one argument as input don't allow the use of DISTINCT. This category includes STTDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and GROUPING. If we try to use DISTINCT with an aggregate function that doesn't allow it, we will get an error. For example: SELECT STDDEV_POP(DISTINCT SALE_PRICE) FROM CUST_ORDER; SELECT STDDEV_POP(DISTINCT SALE_PRICE) * ERROR at line 1: ORA-30482: DISTINCT option not allowed for this function However, using ALL with such a function doesn't cause any error. For example: SELECT STDDEV_POP(ALL SALE_PRICE) FROM CUST_ORDER; STDDEV_POP(ALLSALE_PRICE) ------------------------- 29.5282639 |