8.7 Building Confidence Intervals
8.7.1 Problem
You want to check to see whether the
calculated sample statistics could be
reasonably representative of the population's
statistics. With respect to our example, assume that a light
bulb's declared lifetime is 1100 hours. Based on a
sample of lifetime tests, can you say with 95% probability that the
quality of the production significantly differs from the declared
measurement? To answer this question, you need to determine whether
the confidence interval around the mean of the sample spans across
the declared lifetime. If the declared lifetime is out of the
confidence interval, then the sample mean does not represent the
population accurately, and we can assume that our declared lifetime
for the light bulbs is probably wrong. Either the quality has dropped
and the bulbs are burning out more quickly, or quality has risen,
causing the bulbs to last longer than we claim.
8.7.2 Solution
The solution is to execute a query that implements the calculations
described earlier for computing a confidence interval. Recall that
the confidence interval was plus or minus a certain amount. Thus, the
following solution query computes two values:
SELECT
AVG(Hours)-STDEV(Hours)/SQRT(COUNT(*))*MAX(p) in1,
AVG(Hours)+STDEV(Hours)/SQRT(COUNT(*))*MAX(p) in2
FROM BulbLife, T_distribution
WHERE df=(
SELECT
CASE WHEN count(*)<=29
THEN count(*)-1
ELSE -1 END FROM BulbLife)
in1 in2
-------- --------
1077.11 1104.89
Based on the given sample, we cannot say that the quality of
production has significantly changed, because the declared value of
1100 hours is within the computed confidence interval for the sample.
8.7.3 Discussion
The solution query calculates the mean of the sample and adds to it
the standard error multiplied by the t-distribution coefficient from
the T_distribution table. In our sample, the degree of freedom is the
number of cases in the sample less 1. The CASE statement ensures that
the appropriate index is used in the T_distribution table. If the
number of values is 30 or more, the CASE statement returns a -1. In
the T_distribution table, the coefficient for an infinite number of
degrees of freedom is identified with a -1 degree of freedom value.
Expressions in the SELECT clause of the solution query calculate the
standard deviation, expand it with the coefficient from the
T_distribution table, and then calculate the interval around the
mean.
This example is interesting, because it shows you how to refer to a
table containing coefficients. You could retrieve the coefficient
separately using another query, store it in a local variable, and
then use it in a second query to compute the confidence interval, but
that's less efficient than the technique shown here
where all the work is done using just one query.
|