8.4 Calculating a Mode
8.4.1 Problem
You want to calculate a modal average of the
bulb-life
results in your database. Recall from the discussion earlier in this
chapter that the mode represents the most frequently occurring value
in a sample.
8.4.2 Solution
SQL Server is not equipped with a mode function, so calculating the
mode is a bit more difficult than calculating the mean. As the
following solution shows, you can calculate the
mode
using a creative combination of COUNT and TOP:
SELECT TOP 1 COUNT(*) frequency, Hours mode
FROM BulbLife
WHERE TestPit=1
GROUP BY hours
ORDER BY COUNT(*) DESC
frequency mode
----------- -----------
2 1085
8.4.3 Discussion
Although it appears strange at first, how the query works becomes
clear once you think about the basic definition for mode. A mode is
the value that occurs most frequently in an observed sample. You can
begin by writing a query to group values together:
SELECT Hours
FROM BulbLife
WHERE TestPit=1
GROUP BY hours
hours
-----------
1043
1057
1077
1085
1093
1099
1109
1114
1129
Next, add the COUNT function to the query to include a count of each
distinct value with the query's results:
SELECT COUNT(*) frequency, Hours
FROM BulbLife
WHERE TestPit=1
GROUP BY Hours
frequency Hours
----------- -----------
1 1043
1 1057
1 1077
2 1085
1 1093
1 1099
1 1109
1 1114
1 1129
Finally, use an ORDER BY clause to put the results in descending
order by frequency, so that the most frequently occurring value is
listed first. Then, use the TOP 1 syntax in your SELECT clause to
limit the results to the first row. The hours value in that first row
will be the mode.
What happens when you have more than one mode in the observed sample
and you need to report all such values? In our hypothetical bulb-life
data, the mode for Test Pit #1 is 1085, while the mode for Test Pit
#2 is 1043. For both modes, the occurrence count is 2. If you want
the mode for all light bulbs, regardless of test pit, then both
values should be returned. The following query shows one way to deal
with this:
SELECT COUNT(*) frequency, Hours mode FROM BulbLife
GROUP BY Hours
HAVING COUNT(*)>= ALL(
SELECT COUNT(*)
FROM BulbLife
GROUP BY Hours)
frequency mode
----------- -----------
2 1043
2 1085
The subquery that you see in this example returns a list of all
occurrence counts for all distinct values in the BulbLife table. It
follows, logically, that one of those counts will represent a
maximum. The HAVING clause in the outer query specifies that the
occurrence count be greater than or equal to all values returned by
the subquery, which, in effect, restricts the results to only those
rows with an occurrence count that equals the maximum occurrence
count.
Be aware that mode is a weak statistic. The mode can be useful if you
know the distribution and the nature of the sample, but it can easily
be altered by adding a few cases with extreme values. For example,
let's say that we get 2 additional bulbs with a
duration of 1129 hours. The mode is then 1129, which is misleading
information, since all other bulbs lasted for much shorter periods.
|