Team LiB   Previous Section   Next Section

13.3 Windowing Functions

The ranking functions described thus far are quite useful when comparing items within a fixed window of time, such as "last year" or "second quarter." But what if we want to perform computations using a window that slides as we progress through the data set? Oracle's windowing functions allow aggregates to be calculated for each row in a result set based on a specified window. The aggregation window can be defined in one of three ways:

  • By specifying a set of rows: "From the current row to the end of the partition."

  • By specifying a time interval: "For the 30 days preceeding the transaction date."

  • By specifying a range of values: "All rows having a transaction amount within 5% of the current row's transaction amount."

To get started, we generate a window that fills the entire partition, and then we see how the window can be detached from one or both ends of the partition so that it floats with the current row. All of the examples will be based on the following query, which calculates total monthly sales for the Mid-Atlantic region:

SELECT month, SUM(tot_sales) monthly_sales
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY 1month;

     MONTH MONTHLY_SALES
---------- -------------
         1        610697
         2        428676
         3        637031
         4        541146
         5        592935
         6        501485
         7        606914
         8        460520
         9        392898
        10        510117
        11        532889
        12        492458

First, we will sum the monthly sales for the entire result set by specifying an "unbounded" window. Note the ROWS BETWEEN clause in the following example:

SELECT month, SUM(tot_sales) monthly_sales,
  SUM(SUM(tot_sales)) OVER (ORDER BY month 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) total_sales
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH MONTHLY_SALES TOTAL_SALES
---------- ------------- -----------
         1        610697     6307766
         2        428676     6307766
         3        637031     6307766
         4        541146     6307766
         5        592935     6307766
         6        501485     6307766
         7        606914     6307766
         8        460520     6307766
         9        392898     6307766
        10        510117     6307766
        11        532889     6307766
        12        492458     6307766

Each time the function executes, it sums the monthly sales from months 1 through 12; thus, the same calculation is being performed 12 times. This is a rather inefficient way to generate the yearly sales total (see Section 13.4 later in this chapter for a better method), but it should give you an idea of the syntax for building an aggregation window. In the next query, we will create a window that spans from the top of the partition to the current row. The function identifies the month that has the maximum sales, up to and including the current month:

SELECT month, SUM(tot_sales) monthly_sales,
  MAX(SUM(tot_sales)) OVER (ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH MONTHLY_SALES MAX_PRECEEDING
---------- ------------- --------------
         1        610697         610697
         2        428676         610697
         3        637031         637031
         4        541146         637031
         5        592935         637031
         6        501485         637031
         7        606914         637031
         8        460520         637031
         9        392898         637031
        10        510117         637031
        11        532889         637031
        12        492458         637031

Unlike the first query, which has a window size fixed at 12 rows, this query's aggregation window grows from a single row for month 1 to 12 rows for month 12. The keywords CURRENT ROW are used to indicate that the window should end at the current row being inspected by the function. If we replace MAX in the previous query with SUM, we can calculate a running total:

SELECT month, SUM(tot_sales) monthly_sales,
  SUM(SUM(tot_sales)) OVER (ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH MONTHLY_SALES MAX_PRECEEDING
---------- ------------- --------------
         1        610697         610697
         2        428676        1039373
         3        637031        1676404
         4        541146        2217550
         5        592935        2810485
         6        501485        3311970
         7        606914        3918884
         8        460520        4379404
         9        392898        4772302
        10        510117        5282419
        11        532889        5815308
        12        492458        6307766

We have now seen examples using windows that are fixed at one or both ends. In the next query, we will define a window that floats freely with each row:

SELECT month, SUM(tot_sales) monthly_sales,
  AVG(SUM(tot_sales)) OVER (ORDER BY month 
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH MONTHLY_SALES ROLLING_AVG
---------- ------------- -----------
         1        610697    519686.5
         2        428676  558801.333
         3        637031  535617.667
         4        541146  590370.667
         5        592935  545188.667
         6        501485  567111.333
         7        606914      522973
         8        460520  486777.333
         9        392898  454511.667
        10        510117  478634.667
        11        532889  511821.333
        12        492458    512673.5

For each of the 12 rows, the function calculates the average sales of the current month, the previous month, and the following month. The value of the ROLLING_AVG column is therefore the average sales within a three month floating window centered on the current month.[2]

[2] Months 1 and 12 are calculated using a 2-month window, since there is no previous month for month 1 or following month for month 12.

13.3.1 FIRST_VALUE and LAST_VALUE

Oracle provides two additional aggregate functions, called FIRST_VALUE and LAST_VALUE, that can be used with windowing functions to identify the values of the first and last values in the window. In the case of the 3-month rolling average query shown previously, we could display the values of all three months along with the average of the three, as in:

SELECT month,
  FIRST_VALUE(SUM(tot_sales)) OVER (ORDER BY month 
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) prev_month,
  SUM(tot_sales) monthly_sales,
  LAST_VALUE(SUM(tot_sales)) OVER (ORDER BY month 
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) next_month,
  AVG(SUM(tot_sales)) OVER (ORDER BY month 
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg
FROM orders
WHERE year = 2001 
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH PREV_MONTH MONTHLY_SALES NEXT_MONTH ROLLING_AVG
---------- ---------- ------------- ---------- -----------
         1     610697        610697     428676    519686.5
         2     610697        428676     637031  558801.333
         3     428676        637031     541146  535617.667
         4     637031        541146     592935  590370.667
         5     541146        592935     501485  545188.667
         6     592935        501485     606914  567111.333
         7     501485        606914     460520      522973
         8     606914        460520     392898  486777.333
         9     460520        392898     510117  454511.667
        10     392898        510117     532889  478634.667
        11     510117        532889     492458  511821.333
        12     532889        492458     492458    512673.5

These functions are useful for queries that compare each value to the first or last value in the period, such as: "How did each month's sales compare to the first month?"

13.3.2 LAG/LEAD Functions

While not technically windowing functions, the LAG and LEAD functions are included here because they allow rows to be referenced by their position relative to the current row, much like the PRECEDING and FOLLOWING clauses within windowing functions. LAG and LEAD are useful for comparing one row of a result set with another row of the same result set. For example, the query "Compute the total sales per month for the Mid-Atlantic region, including the percent change from the previous month" requires data from both the current and preceding rows in order to calculate the answer. This is, in effect, a two row window, but the offset from the current row can be specified as one or more rows, making LAG and LEAD act like specialized windowing functions where only the outer edges of the window are utilized.

Here is the SQL that uses the LAG function to generate the data needed to answer the question posed in the previous paragraph:

SELECT month, SUM(tot_sales) monthly_sales,
  LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales
FROM orders
WHERE year = 2001
  AND region_id = 6
GROUP BY month
ORDER BY month;

     MONTH MONTHLY_SALES PREV_MONTH_SALES
---------- ------------- ----------------
         1        610697
         2        428676           610697
         3        637031           428676
         4        541146           637031
         5        592935           541146
         6        501485           592935
         7        606914           501485
         8        460520           606914
         9        392898           460520
        10        510117           392898
        11        532889           510117
        12        492458           532889

As we might expect, the LAG value for month 1 is NULL, since there is no preceding month. This would also be the case for the LEAD value for month 12. Take this into account when performing calculations that utilize the results of the LAG or LEAD functions.

The next query utilizes the output from the previous query to generate the percentage difference from month to month. Note how the prev_month_sales column is wrapped in the NVL function so that month 1 won't generate a NULL value for the percentage change:

SELECT months.month month, months.monthly_sales monthly_sales,
  ROUND((months.monthly_sales -- NVL(months.prev_month_sales,
    months.monthly_sales)) /
    NVL(months.prev_month_sales, months.monthly_sales),
    3) * 100 percent_change
FROM
 (SELECT month, SUM(tot_sales) monthly_sales,
    LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales
  FROM orders
  WHERE year = 2001
    AND region_id = 6
  GROUP BY month) months
ORDER BY month;

     MONTH MONTHLY_SALES PERCENT_CHANGE
---------- ------------- --------------
         1        610697              0
         2        428676          -29.8
         3        637031           48.6
         4        541146          -15.1
         5        592935            9.6
         6        501485          -15.4
         7        606914             21
         8        460520          -24.1
         9        392898          -14.7
        10        510117           29.8
        11        532889            4.5
        12        492458           -7.6
    Team LiB   Previous Section   Next Section