Team LiB   Previous Section   Next Section

8.4 Calculating a Mode

8.4.1 Problem

You want to calculate a modal average of the bulb-life results in your database. Recall from the discussion earlier in this chapter that the mode represents the most frequently occurring value in a sample.

8.4.2 Solution

SQL Server is not equipped with a mode function, so calculating the mode is a bit more difficult than calculating the mean. As the following solution shows, you can calculate the mode using a creative combination of COUNT and TOP:

SELECT TOP 1 COUNT(*) frequency, Hours mode 
FROM BulbLife
WHERE TestPit=1
GROUP BY hours
ORDER BY COUNT(*) DESC

frequency   mode        
----------- ----------- 
2           1085

8.4.3 Discussion

Although it appears strange at first, how the query works becomes clear once you think about the basic definition for mode. A mode is the value that occurs most frequently in an observed sample. You can begin by writing a query to group values together:

SELECT Hours
FROM BulbLife
WHERE TestPit=1
GROUP BY hours

hours       
----------- 
1043
1057
1077
1085
1093
1099
1109
1114
1129

Next, add the COUNT function to the query to include a count of each distinct value with the query's results:

SELECT COUNT(*) frequency, Hours
FROM BulbLife
WHERE TestPit=1
GROUP BY Hours

frequency   Hours       
----------- ----------- 
1           1043
1           1057
1           1077
2           1085
1           1093
1           1099
1           1109
1           1114
1           1129

Finally, use an ORDER BY clause to put the results in descending order by frequency, so that the most frequently occurring value is listed first. Then, use the TOP 1 syntax in your SELECT clause to limit the results to the first row. The hours value in that first row will be the mode.

What happens when you have more than one mode in the observed sample and you need to report all such values? In our hypothetical bulb-life data, the mode for Test Pit #1 is 1085, while the mode for Test Pit #2 is 1043. For both modes, the occurrence count is 2. If you want the mode for all light bulbs, regardless of test pit, then both values should be returned. The following query shows one way to deal with this:

SELECT COUNT(*) frequency, Hours mode FROM BulbLife
GROUP BY Hours
HAVING COUNT(*)>= ALL(
   SELECT COUNT(*)
   FROM BulbLife
   GROUP BY Hours)

frequency   mode       
----------- ----------- 
2           1043
2           1085

The subquery that you see in this example returns a list of all occurrence counts for all distinct values in the BulbLife table. It follows, logically, that one of those counts will represent a maximum. The HAVING clause in the outer query specifies that the occurrence count be greater than or equal to all values returned by the subquery, which, in effect, restricts the results to only those rows with an occurrence count that equals the maximum occurrence count.

Be aware that mode is a weak statistic. The mode can be useful if you know the distribution and the nature of the sample, but it can easily be altered by adding a few cases with extreme values. For example, let's say that we get 2 additional bulbs with a duration of 1129 hours. The mode is then 1129, which is misleading information, since all other bulbs lasted for much shorter periods.

    Team LiB   Previous Section   Next Section