13.3 Windowing FunctionsThe ranking functions described thus far are quite useful when comparing items within a fixed window of time, such as "last year" or "second quarter." But what if we want to perform computations using a window that slides as we progress through the data set? Oracle's windowing functions allow aggregates to be calculated for each row in a result set based on a specified window. The aggregation window can be defined in one of three ways:
To get started, we generate a window that fills the entire partition, and then we see how the window can be detached from one or both ends of the partition so that it floats with the current row. All of the examples will be based on the following query, which calculates total monthly sales for the Mid-Atlantic region: SELECT month, SUM(tot_sales) monthly_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY 1month; MONTH MONTHLY_SALES ---------- ------------- 1 610697 2 428676 3 637031 4 541146 5 592935 6 501485 7 606914 8 460520 9 392898 10 510117 11 532889 12 492458 First, we will sum the monthly sales for the entire result set by specifying an "unbounded" window. Note the ROWS BETWEEN clause in the following example: SELECT month, SUM(tot_sales) monthly_sales, SUM(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) total_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES TOTAL_SALES ---------- ------------- ----------- 1 610697 6307766 2 428676 6307766 3 637031 6307766 4 541146 6307766 5 592935 6307766 6 501485 6307766 7 606914 6307766 8 460520 6307766 9 392898 6307766 10 510117 6307766 11 532889 6307766 12 492458 6307766 Each time the function executes, it sums the monthly sales from months 1 through 12; thus, the same calculation is being performed 12 times. This is a rather inefficient way to generate the yearly sales total (see Section 13.4 later in this chapter for a better method), but it should give you an idea of the syntax for building an aggregation window. In the next query, we will create a window that spans from the top of the partition to the current row. The function identifies the month that has the maximum sales, up to and including the current month: SELECT month, SUM(tot_sales) monthly_sales, MAX(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES MAX_PRECEEDING ---------- ------------- -------------- 1 610697 610697 2 428676 610697 3 637031 637031 4 541146 637031 5 592935 637031 6 501485 637031 7 606914 637031 8 460520 637031 9 392898 637031 10 510117 637031 11 532889 637031 12 492458 637031 Unlike the first query, which has a window size fixed at 12 rows, this query's aggregation window grows from a single row for month 1 to 12 rows for month 12. The keywords CURRENT ROW are used to indicate that the window should end at the current row being inspected by the function. If we replace MAX in the previous query with SUM, we can calculate a running total: SELECT month, SUM(tot_sales) monthly_sales, SUM(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES MAX_PRECEEDING ---------- ------------- -------------- 1 610697 610697 2 428676 1039373 3 637031 1676404 4 541146 2217550 5 592935 2810485 6 501485 3311970 7 606914 3918884 8 460520 4379404 9 392898 4772302 10 510117 5282419 11 532889 5815308 12 492458 6307766 We have now seen examples using windows that are fixed at one or both ends. In the next query, we will define a window that floats freely with each row: SELECT month, SUM(tot_sales) monthly_sales, AVG(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- ----------- 1 610697 519686.5 2 428676 558801.333 3 637031 535617.667 4 541146 590370.667 5 592935 545188.667 6 501485 567111.333 7 606914 522973 8 460520 486777.333 9 392898 454511.667 10 510117 478634.667 11 532889 511821.333 12 492458 512673.5 For each of the 12 rows, the function calculates the average sales of the current month, the previous month, and the following month. The value of the ROLLING_AVG column is therefore the average sales within a three month floating window centered on the current month.[2]
13.3.1 FIRST_VALUE and LAST_VALUEOracle provides two additional aggregate functions, called FIRST_VALUE and LAST_VALUE, that can be used with windowing functions to identify the values of the first and last values in the window. In the case of the 3-month rolling average query shown previously, we could display the values of all three months along with the average of the three, as in: SELECT month, FIRST_VALUE(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) prev_month, SUM(tot_sales) monthly_sales, LAST_VALUE(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) next_month, AVG(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH PREV_MONTH MONTHLY_SALES NEXT_MONTH ROLLING_AVG ---------- ---------- ------------- ---------- ----------- 1 610697 610697 428676 519686.5 2 610697 428676 637031 558801.333 3 428676 637031 541146 535617.667 4 637031 541146 592935 590370.667 5 541146 592935 501485 545188.667 6 592935 501485 606914 567111.333 7 501485 606914 460520 522973 8 606914 460520 392898 486777.333 9 460520 392898 510117 454511.667 10 392898 510117 532889 478634.667 11 510117 532889 492458 511821.333 12 532889 492458 492458 512673.5 These functions are useful for queries that compare each value to the first or last value in the period, such as: "How did each month's sales compare to the first month?" 13.3.2 LAG/LEAD FunctionsWhile not technically windowing functions, the LAG and LEAD functions are included here because they allow rows to be referenced by their position relative to the current row, much like the PRECEDING and FOLLOWING clauses within windowing functions. LAG and LEAD are useful for comparing one row of a result set with another row of the same result set. For example, the query "Compute the total sales per month for the Mid-Atlantic region, including the percent change from the previous month" requires data from both the current and preceding rows in order to calculate the answer. This is, in effect, a two row window, but the offset from the current row can be specified as one or more rows, making LAG and LEAD act like specialized windowing functions where only the outer edges of the window are utilized. Here is the SQL that uses the LAG function to generate the data needed to answer the question posed in the previous paragraph: SELECT month, SUM(tot_sales) monthly_sales, LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES PREV_MONTH_SALES ---------- ------------- ---------------- 1 610697 2 428676 610697 3 637031 428676 4 541146 637031 5 592935 541146 6 501485 592935 7 606914 501485 8 460520 606914 9 392898 460520 10 510117 392898 11 532889 510117 12 492458 532889 As we might expect, the LAG value for month 1 is NULL, since there is no preceding month. This would also be the case for the LEAD value for month 12. Take this into account when performing calculations that utilize the results of the LAG or LEAD functions. The next query utilizes the output from the previous query to generate the percentage difference from month to month. Note how the prev_month_sales column is wrapped in the NVL function so that month 1 won't generate a NULL value for the percentage change: SELECT months.month month, months.monthly_sales monthly_sales, ROUND((months.monthly_sales -- NVL(months.prev_month_sales, months.monthly_sales)) / NVL(months.prev_month_sales, months.monthly_sales), 3) * 100 percent_change FROM (SELECT month, SUM(tot_sales) monthly_sales, LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month) months ORDER BY month; MONTH MONTHLY_SALES PERCENT_CHANGE ---------- ------------- -------------- 1 610697 0 2 428676 -29.8 3 637031 48.6 4 541146 -15.1 5 592935 9.6 6 501485 -15.4 7 606914 21 8 460520 -24.1 9 392898 -14.7 10 510117 29.8 11 532889 4.5 12 492458 -7.6 |