Team LiB   Previous Section   Next Section

13.2 Ranking Functions

Determining the performance of a particular business entity compared to its peers is central to a wide variety of business decisions. Examples include:

  • Identifying assets with the highest utilization.

  • Determining the worst-selling products by region.

  • Finding the best-performing salespeople.

Prior to the release of Oracle8i, programmers could use the ORDER BY clause to sort a result set on one or more columns, but any further processing to calculate rankings or percentiles had to be performed using a procedural language. Beginning with Oracle8i, however, developers can take advantage of several new functions to either generate rankings for each row in a result set or to group rows into buckets for percentile calculations.

13.2.1 RANK, DENSE_RANK, and ROW_NUMBER

The RANK, DENSE_RANK, and ROW_NUMBER functions generate an integer value from 1 to N for each row, where N is less than or equal to the number of rows in the result set. The differences in the values returned by these functions revolves around how each one handles ties:

  • ROW_NUMBER returns a unique number for each row starting with 1. For rows that have duplicate values, numbers are arbitrarily assigned.

  • DENSE_RANK assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned.

  • RANK assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

To illustrate the differences, we generate rankings for each customer according to their total yearly sales. Here is the query to generate the sales data for the year 2001:

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY region_id, cust_nbr;

REGION_ID    CUST_NBR CUST_SALES
---------- ---------- ----------
         5          1    1151162
         5          2    1224992
         5          3    1161286
         5          4    1878275
         5          5    1169926
         6          6    1788836
         6          7     971585
         6          8    1141638
         6          9    1208959
         6         10    1196748
         7         11    1190421
         7         12    1182275
         7         13    1310434
         7         14    1929774
         7         15    1255591
         8         16    1068467
         8         17    1944281
         8         18    1253840
         8         19    1174421
         8         20    1412006
         9         21    1020541
         9         22    1036146
         9         23    1224992
         9         24    1224992
         9         25    2232703
        10         26    1808949
        10         27    1322747
        10         28     986964
        10         29     903383
        10         30    1216858

Notice that three of the customers (2, 23, and 24) have the same value for total sales ($1,224,992). In the next query, we will add three function calls to generate rankings for each customer across all regions, and we will order the result set by the ROW_NUMBER function to make the difference in rankings easier to observe:

SELECT region_id, cust_nbr, 
  SUM(tot_sales) cust_sales,
  RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_rank,
  DENSE_RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_dense_rank,
  ROW_NUMBER(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_number
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 6;

REGION_ID   CUST_NBR  CUST_SALES SALES_RANK SALES_DENSE_RANK SALES_NUMBER
---------- ---------- ---------- ---------- ---------------- ------------
         9         25    2232703          1                1            1
         8         17    1944281          2                2            2
         7         14    1929774          3                3            3
         5          4    1878275          4                4            4
        10         26    1808949          5                5            5
         6          6    1788836          6                6            6
         8         20    1412006          7                7            7
        10         27    1322747          8                8            8
         7         13    1310434          9                9            9
         7         15    1255591         10               10           10
         8         18    1253840         11               11           11
         5          2    1224992         12               12           12
         9         23    1224992         12               12           13
         9         24    1224992         12               12           14
        10         30    1216858         15               13           15
         6          9    1208959         16               14           16
         6         10    1196748         17               15           17
         7         11    1190421         18               16           18
         7         12    1182275         19               17           19
         8         19    1174421         20               18           20
         5          5    1169926         21               19           21
         5          3    1161286         22               20           22
         5          1    1151162         23               21           23
         6          8    1141638         24               22           24
         8         16    1068467         25               23           25
         9         22    1036146         26               24           26
         9         21    1020541         27               25           27
        10         28     986964         28               26           28
         6          7     971585         29               27           29
        10         29     903383         30               28           30

Don't be confused by the ORDER BY clause at the end of the query and the ORDER BY clauses within each function call; the functions use their ORDER BY clause internally to sort the results for the purpose of applying a ranking. Thus, each of the three functions applies its ranking algorithm to the sum of each customer's sales in descending order. The final ORDER BY clause specifies the results of the ROW_NUMBER function as the sort key for the final result set, but we could have picked any of the six columns as our sort key.

Both the RANK and DENSE_RANK functions assign the rank of 12 to the 3 rows with total sales of $1,224,992, while the ROW_NUMBER function assigns the ranks 12, 13, and 14 to the same rows. The difference between the RANK and DENSE_RANK functions manifests itself in the ranking assigned to the next-lowest sales total; the RANK function leaves a gap in the ranking sequence and assigns a rank of 15 to customer number 30, while the DENSE_RANK function continues the sequence with a ranking of 13.

Deciding which of the three functions to use depends on the desired outcome. If we want to identify the top 13 customers from this result set, we would use:

  • ROW_NUMBER if we want exactly 13 rows without regard to ties. In this case, one of the customers who might otherwise be included in the list will be excluded from the final set.

  • RANK if we want at least 13 rows but don't want to include rows that would have been excluded had there been no ties. In this case, we would retrieve 14 rows.

  • DENSE_RANK if we want all customers with a ranking of 13 or less, including all duplicates. In this case, we would retrieve 15 rows.

While the previous query generates rankings across the entire result set, it is also possible to generate independent sets of rankings across multiple partitions of the result set. The following query generates rankings for customer sales within each region rather than across all regions. Note the addition of the PARTITION BY clause:

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,
  RANK(  ) OVER (PARTITION BY region_id
    ORDER BY SUM(tot_sales) DESC) sales_rank,
  DENSE_RANK(  ) OVER (PARTITION BY region_id
    ORDER BY SUM(tot_sales) DESC) sales_dense_rank,
  ROW_NUMBER(  ) OVER (PARTITION BY region_id
    ORDER BY SUM(tot_sales) DESC) sales_number
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 1,6;

REGION_ID    CUST_NBR CUST_SALES SALES_RANK SALES_DENSE_RANK SALES_NUMBER
---------- ---------- ---------- ---------- ---------------- ------------
         5          4    1878275          1                1            1
         5          2    1224992          2                2            2
         5          5    1169926          3                3            3
         5          3    1161286          4                4            4
         5          1    1151162          5                5            5
         6          6    1788836          1                1            1
         6          9    1208959          2                2            2
         6         10    1196748          3                3            3
         6          8    1141638          4                4            4
         6          7     971585          5                5            5
         7         14    1929774          1                1            1
         7         13    1310434          2                2            2
         7         15    1255591          3                3            3
         7         11    1190421          4                4            4
         7         12    1182275          5                5            5
         8         17    1944281          1                1            1
         8         20    1412006          2                2            2
         8         18    1253840          3                3            3
         8         19    1174421          4                4            4
         8         16    1068467          5                5            5
         9         25    2232703          1                1            1
         9         23    1224992          2                2            2
         9         24    1224992          2                2            3
         9         22    1036146          4                3            4
         9         21    1020541          5                4            5
        10         26    1808949          1                1            1
        10         27    1322747          2                2            2
        10         30    1216858          3                3            3
        10         28     986964          4                4            4
        10         29     903383          5                5            5

Each customer receives a ranking between one and five depending on their relation to other customers in the same region. Of the three customers with duplicate total sales, two of them are in region 9; as before, the RANK and DENSE_RANK functions generate identical rankings for both customers.

The PARTITION BY clause used in ranking functions is used to divide a result set into pieces so that rankings can be applied within each subset. This is completely different from the PARTITION BY RANGE/HASH/LIST clauses introduced in Chapter 10 for breaking a table or index into multiple pieces.

13.2.1.1 Handling NULLs

All ranking functions allow the caller to specify where in the ranking order NULL values should appear. This is accomplished by appending either NULLS FIRST or NULLS LAST after the ORDER BY clause of the function, as in:

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,
  RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC NULLS LAST) sales_rank
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr;

If omitted, NULL values will either appear last in ascending rankings or first in descending rankings.

13.2.1.2 Top/Bottom-N queries

One of the most common uses of a ranked data set is to identify the top-N or bottom-N performers. Since we can't call analytic functions from the WHERE or HAVING clauses, we are forced to generate the rankings for all the rows and then use an outer query to filter out the unwanted rankings. For example, the following query uses an inline view to identify the top-5 salespersons for 2001:

SELECT s.name, sp.sp_sales total_sales
FROM salesperson s,
 (SELECT salesperson_id, SUM(tot_sales) sp_sales,
    RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_rank
  FROM orders
  WHERE year = 2001
  GROUP BY salesperson_id) sp
WHERE sp.sales_rank <= 5
  AND sp.salesperson_id = s.salesperson_id
ORDER BY sp.sales_rank;

NAME                                              TOTAL_SALES
------------------------------------------------- -----------
Jeff Blake                                            1927580
Sam Houseman                                          1814327
Mark Russell                                          1784596
John Boorman                                          1768813
Carl Isaacs                                           1761814
13.2.1.3 FIRST/LAST

While there is no function for returning only the top or bottom-N from a ranked result set, Oracle provides functionality for identifying the first (top 1) or last (bottom 1) records in a ranked set. This is useful for queries such as the following: "Find the regions with the best and worst total sales last year." Unlike the top-5 salespeople example from the previous section, this query needs an additional piece of information—the size of the result set—in order to answer the question.

Oracle 9i provides the ability to answer such queries efficiently using functions that rank the result set based on a specified ordering, identify the row with the top or bottom ranking, and report on any column available in the result set. These functions are composed of three parts:

  1. An ORDER BY clause that specifies how to rank the result set.

  2. The keywords FIRST and LAST to specify whether to use the top or bottom-ranked row.

  3. An aggregate function (i.e., MIN, MAX, AVG, COUNT) used as a tiebreaker in case more than one row of the result set tie for the FIRST or LAST spot in the ranking.

The following query uses the MIN aggregate function to find the regions that rank FIRST and LAST by total sales:

SELECT
  MIN(region_id)
    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) best_region,
  MIN(region_id)
    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) worst_region
FROM orders
WHERE year = 2001
GROUP BY region_id;

BEST_REGION WORST_REGION
----------- ------------
          7           10

The use of the MIN function in the previous query is a bit confusing: it is only used if more than one region ties for either first or last place in the ranking. If there were a tie, the row with the minimum value for region_id would be chosen. To find out if a tie actually exists, we could call each function twice using MIN for the first and MAX for the second, and see if they return the same results:

SELECT
  MIN(region_id)
    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) min_best_region,
  MAX(region_id)
    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) max_best_region,
  MIN(region_id)
    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) min_worst_region,
  MAX(region_id)
    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) max_worst_region
FROM orders
WHERE year = 2001
GROUP BY region_id;

MIN_BEST_REGION MAX_BEST_REGION MIN_WORST_REGION MAX_WORST_REGION
--------------- --------------- ---------------- ----------------
              7               7               10               10

In this case, there are no ties for either first or last place. Depending on the type of data you are working with, using an aggregate function as a tiebreaker can be somewhat arbitrary.

13.2.2 NTILE

Another way rankings are commonly used is to generate buckets into which sets of rankings are grouped. For example, we may want to find those customers whose total sales ranked in the top 25%. The following query uses the NTILE function to group the customers into four buckets (or quartiles):

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,
  NTILE(4) OVER (ORDER BY SUM(tot_sales) DESC) sales_quartile
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 4,3 DESC;

REGION_ID    CUST_NBR CUST_SALES SALES_QUARTILE
---------- ---------- ---------- --------------
         9         25    2232703              1
         8         17    1944281              1
         7         14    1929774              1
         5          4    1878275              1
        10         26    1808949              1
         6          6    1788836              1
         8         20    1412006              1
        10         27    1322747              1
         7         13    1310434              2
         7         15    1255591              2
         8         18    1253840              2
         5          2    1224992              2
         9         23    1224992              2
         9         24    1224992              2
        10         30    1216858              2
         6          9    1208959              2
         6         10    1196748              3
         7         11    1190421              3
         7         12    1182275              3
         8         19    1174421              3
         5          5    1169926              3
         5          3    1161286              3
         5          1    1151162              3
         6          8    1141638              4
         8         16    1068467              4
         9         22    1036146              4
         9         21    1020541              4
        10         28     986964              4
         6          7     971585              4
        10         29     903383              4

The sales_quartile column in this query specifies NTILE(4) in order to create four buckets. The NTILE function finds each row's place in the ranking, and then assigns each row to a bucket such that every bucket contains the same number of rows. If the number of rows is not evenly divisible by the number of buckets, then the extra rows are distributed so that the number of rows per bucket differs by one at most. In the previous example, there are four buckets allocated for 30 rows, with buckets one and two containg eight rows each, and buckets three and four containing seven rows each. This approach is referred to as equiheight buckets because each bucket contains (optimally) the same number of rows.

Just like in the top-N query discussed earlier, we will need to wrap the query in an inline view if we want to filter on the NTILE result:

SELECT r.name region, c.name customer, cs.cust_sales
FROM customer c, region r,
 (SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,
    NTILE(4) OVER (ORDER BY SUM(tot_sales) DESC) sales_quartile
  FROM orders
  WHERE year = 2001
  GROUP BY region_id, cust_nbr) cs
WHERE cs.sales_quartile = 1
  AND cs.cust_nbr = c.cust_nbr
  AND cs.region_id = r.region_id
ORDER BY cs.cust_sales DESC;

REGION               CUSTOMER                       CUST_SALES
-------------------- ------------------------------ ----------
NorthWest US         Worcester Technologies            2232703
SouthWest US         Evans Supply Corp.                1944281
SouthEast US         Madden Industries                 1929774
New England          Flowtech Inc.                     1878275
Central US           Alpha Technologies                1808949
Mid-Atlantic         Spartan Industries                1788836
SouthWest US         Malden Labs                       1412006
Central US           Phillips Labs                     1322747

The outer query filters on sales_quartile = 1, which removes all rows not in the top 25% of sales, and then joins the region and customer dimensions to generate the final results.

13.2.3 WIDTH_BUCKET

Similar to the NTILE function, the WIDTH_BUCKET function groups rows of the result set into buckets. Unlike NTILE, however, the WIDTH_BUCKET function attempts to create equiwidth buckets, meaning that the range of values is evenly distributed across the buckets. If your data were distributed across a bell curve, therefore, you could expect the buckets representing the low and high ranges of the bell curve to contain few records, whereas the buckets representing the middle ranges would contain many records.

New in Oracle9i, WIDTH_BUCKET can operate on numeric or date types, and takes the following four parameters:

  1. The expression that generates the buckets.

  2. The value used as the start of the range for bucket #1.

  3. The value used as the end of the range for bucket #N.

  4. The number of buckets to create (N).

WIDTH_BUCKET uses the values of the second, third, and fourth parameters to generate N buckets containing comparable ranges. If the expression yields values that fall outside the range specified by the second and third parameters, the WIDTH_BUCKET function will generate two additional buckets, numbered 0 and N+1, into which the outliers are placed. If we want to work with the entire result set, we need to make sure our values for the second and third parameters completely enclose the range of values in the result set. However, if we only wish to work with a subset of the data, we can specify values for the second and third parameters that enclose the desired range, and any rows falling outside the range will be placeds into buckets 0 and N+1.

To illustrate, we will use the NTILE example from earlier to generate three buckets for the total sales per customer:

SELECT region_id, cust_nbr,
  SUM(tot_sales) cust_sales,
  WIDTH_BUCKET(SUM(tot_sales), 1, 3000000, 3) sales_buckets
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 3;

REGION_ID   CUST_NBR CUST_SALES SALES_BUCKETS
---------- ---------- ---------- -------------
        10         29     903383             1
         6          7     971585             1
        10         28     986964             1
         9         21    1020541             2
         9         22    1036146             2
         8         16    1068467             2
         6          8    1141638             2
         5          1    1151162             2
         5          3    1161286             2
         5          5    1169926             2
         8         19    1174421             2
         7         12    1182275             2
         7         11    1190421             2
         6         10    1196748             2
         6          9    1208959             2
        10         30    1216858             2
         5          2    1224992             2
         9         24    1224992             2
         9         23    1224992             2
         8         18    1253840             2
         7         15    1255591             2
         7         13    1310434             2
        10         27    1322747             2
         8         20    1412006             2
         6          6    1788836             2
        10         26    1808949             2
         5          4    1878275             2
         7         14    1929774             2
         8         17    1944281             2
         9         25    2232703             3

Based on these parameters, the WIDTH_BUCKET function generates three buckets; the first bucket starts at 1, and the third bucket has an upper range of 3,000,000. Since there are three buckets, the ranges for each bucket will be 1 to 1,000,000, 1,000,0001 to 2,000,000, and 2,000,0001 to 3,000,000. When the rows are placed in the appropriate bucket, we find that three rows fall into bucket #1, a single row falls in bucket #3, and the remaining 26 rows fall into the second bucket.

The values 1 and 3,000,000 were chosen to guarantee that all rows in the result set would be placed into one of the three buckets. If we want to generate buckets only for rows that have aggregate sales between $1,000,000 and $2,000,000, the WIDTH_BUCKET function will place the remaining rows in the 0th and 4th buckets:

SELECT region_id, cust_nbr,
  SUM(tot_sales) cust_sales,
  WIDTH_BUCKET(SUM(tot_sales), 1000000, 2000000, 3) sales_buckets
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 3;

REGION_ID   CUST_NBR CUST_SALES SALES_BUCKETS
---------- ---------- ---------- -------------
        10         29     903383             0
         6          7     971585             0
        10         28     986964             0
         9         21    1020541             1
         9         22    1036146             1
         8         16    1068467             1
         6          8    1141638             1
         5          1    1151162             1
         5          3    1161286             1
         5          5    1169926             1
         8         19    1174421             1
         7         12    1182275             1
         7         11    1190421             1
         6         10    1196748             1
         6          9    1208959             1
        10         30    1216858             1
         5          2    1224992             1
         9         24    1224992             1
         9         23    1224992             1
         8         18    1253840             1
         7         15    1255591             1
         7         13    1310434             1
        10         27    1322747             1
         8         20    1412006             2
         6          6    1788836             3
        10         26    1808949             3
         5          4    1878275             3
         7         14    1929774             3
         8         17    1944281             3
         9         25    2232703             4

Keep in mind that the WIDTH_BUCKET function does not remove rows from the result set that do not lie within the specified range; rather, they are placed into special buckets that your query can either utilize or ignore as needed.

13.2.4 CUME_DIST and PERCENT_RANK

The final two ranking functions, CUME_DIST and PERCENT_RANK, use the rank of a particular row to calculate additional information. The CUME_DIST function (short for Cumulative Distribution) calculates the ratio of the number of rows that have a lesser or equal ranking to the total number of rows in the partition. The PERCENT_RANK function calculates the ratio of a row's ranking to the number of rows in the partition using the formula:

(RRP -- 1) / (NRP -- 1)

where:

RRP

Stands for "rank of row in partition."

NRP

Stands for "number of rows in partition."

Both calculations utilize DENSE_RANK for their rankings and can be specified to be in ascending or descending order. The following query demonstrates the use of these two functions (both specifying descending order) with our customer yearly sales query:

SELECT region_id, cust_nbr, 
  SUM(tot_sales) cust_sales,
  CUME_DIST(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_cume_dist,
  PERCENT_RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_percent_rank
FROM orders
WHERE year = 2001
GROUP BY region_id, cust_nbr
ORDER BY 3 DESC;

REGION_ID   CUST_NBR  CUST_SALES SALES_CUME_DIST SALES_PERCENT_RANK
---------- ---------- ---------- --------------- ------------------
         9         25    2232703      .033333333                  0
         8         17    1944281      .066666667         .034482759
         7         14    1929774              .1         .068965517
         5          4    1878275      .133333333         .103448276
        10         26    1808949      .166666667         .137931034
         6          6    1788836              .2         .172413793
         8         20    1412006      .233333333         .206896552
        10         27    1322747      .266666667         .24137931
         7         13    1310434              .3         .275862069
         7         15    1255591      .333333333         .310344828
         8         18    1253840      .366666667         .344827586
         5          2    1224992      .466666667         .379310345
         9         23    1224992      .466666667         .379310345
         9         24    1224992      .466666667         .379310345
        10         30    1216858              .5         .482758621
         6          9    1208959      .533333333         .517241379
         6         10    1196748      .566666667         .551724138
         7         11    1190421              .6         .586206897
         7         12    1182275      .633333333         .620689655
         8         19    1174421      .666666667         .655172414
         5          5    1169926              .7         .689655172
         5          3    1161286      .733333333         .724137931
         5          1    1151162      .766666667         .75862069
         6          8    1141638              .8         .793103448
         8         16    1068467      .833333333         .827586207
         9         22    1036146      .866666667         .862068966
         9         21    1020541              .9         .896551724
        10         28     986964      .933333333         .931034483
         6          7     971585      .966666667         .965517241
        10         29     903383               1                  1

Let's walk through a couple of calculations for customer number 1 in the previous result set. With total sales of $1,151,162, customer number 1 ranks 23rd in the set of 30 customers in descending order of sales. Since there are a total of 30 rows, the CUME_DIST is equal to 23/30, or .766666667. The PERCENT_RANK function yields (23 - 1) / (30 - 1) = .75862069. It should come as no surprise that both functions return identical values for the rows that have identical sales totals, since the calculations are based on rank, which is identical for all three rows.

13.2.5 Hypothetical Functions

For some types of analysis, determining what might have happened is more revealing than knowing what really happened. With the Oracle9i release, Oracle provides special versions of RANK, DENSE_RANK, CUME_DIST, and PERCENT_RANK that allow rankings and distributions to be calculated for hypothetical data, allowing the user to see what would have happened if a specific value (or set of values) was included in a data set.

In order to illustrate this concept, we will first rank our customers by total sales for 2001, and then we will see where a hypothetical sales figure would fall in the ranking. Here is the query that generates the rankings and distributions:

SELECT cust_nbr, SUM(tot_sales) cust_sales,
  RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) rank,
  DENSE_RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) dense_rank,
  CUME_DIST(  ) OVER (ORDER BY SUM(tot_sales) DESC) cume_dist,
  PERCENT_RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) percent_rank
FROM orders
WHERE year = 2001
GROUP BY cust_nbr
ORDER BY 3;

  CUST_NBR CUST_SALES       RANK DENSE_RANK  CUME_DIST PERCENT_RANK
---------- ---------- ---------- ---------- ---------- ------------
        25    2232703          1          1 .033333333            0
        17    1944281          2          2 .066666667   .034482759
        14    1929774          3          3         .1   .068965517
         4    1878275          4          4 .133333333   .103448276
        26    1808949          5          5 .166666667   .137931034
         6    1788836          6          6         .2   .172413793
        20    1412006          7          7 .233333333   .206896552
        27    1322747          8          8 .266666667    .24137931
        13    1310434          9          9         .3   .275862069
        15    1255591         10         10 .333333333   .310344828
        18    1253840         11         11 .366666667   .344827586
         2    1224992         12         12 .466666667   .379310345
        23    1224992         12         12 .466666667   .379310345
        24    1224992         12         12 .466666667   .379310345
        30    1216858         15         13         .5   .482758621
         9    1208959         16         14 .533333333   .517241379
        10    1196748         17         15 .566666667   .551724138
        11    1190421         18         16         .6   .586206897
        12    1182275         19         17 .633333333   .620689655
        19    1174421         20         18 .666666667   .655172414
         5    1169926         21         19         .7   .689655172
         3    1161286         22         20 .733333333   .724137931
         1    1151162         23         21 .766666667    .75862069
         8    1141638         24         22         .8   .793103448
        16    1068467         25         23 .833333333   .827586207
        22    1036146         26         24 .866666667   .862068966
        21    1020541         27         25         .9   .896551724
        28     986964         28         26 .933333333   .931034483
         7     971585         29         27 .966666667   .965517241
        29     903383         30         28          1            1

Now let's see where a customer with an even million dollars of sales would have ranked:

SELECT 
  RANK(1000000) WITHIN GROUP 
    (ORDER BY SUM(tot_sales) DESC) hyp_rank,
  DENSE_RANK(1000000) WITHIN GROUP 
    (ORDER BY SUM(tot_sales) DESC) hyp_dense_rank,
  CUME_DIST(1000000) WITHIN GROUP 
    (ORDER BY SUM(tot_sales) DESC) hyp_cume_dist,
  PERCENT_RANK(1000000) WITHIN GROUP 
    (ORDER BY SUM(tot_sales) DESC) hyp_percent_rank
FROM orders
WHERE year = 2001
GROUP BY cust_nbr;

  HYP_RANK HYP_DENSE_RANK HYP_CUME_DIST HYP_PERCENT_RANK
---------- -------------- ------------- ----------------
        28             26    .903225806               .9

The WITHIN GROUP clause has the effect of injecting a fictitious row into the result set before determining the rankings. One possible use of this functionality would be to see how actual sales compare to sales targets.

    Team LiB   Previous Section   Next Section