Team LiB   Previous Section   Next Section

8.8 Calculating Correlation

8.8.1 Problem

You want to calculate the correlation between two samples. For example, you want to calculate how similar the light-bulb sales patterns are for two different years.

8.8.2 Solution

The query in the following example uses the formula for calculating correlation coefficients shown earlier in this chapter. It does this for the years 1997 and 1998.

SELECT 
   (COUNT(*)*SUM(x.Sales*y.Sales)-SUM(x.Sales)*SUM(y.Sales))/(
   SQRT(COUNT(*)*SUM(SQUARE(x.Sales))-SQUARE(SUM(x.Sales)))*
   SQRT(COUNT(*)*SUM(SQUARE(y.Sales))-SQUARE(SUM(y.Sales)))) 
   correlation
FROM BulbSales x JOIN BulbSales y ON x.month=y.month
WHERE x.Year=1997 AND y.Year=1998 

correlation                                           
----------------------------------------------------- 
0.79

The correlation calculated is 0.79, which means that the sales patterns between the two years are highly correlated or, in other words, very similar.

8.8.3 Discussion

The solution query implements the formula shown earlier in the chapter for calculating a correlation coefficient. The solution query shown here is an example of how you implement a complex formula directly in SQL. To aid you in making such translations from mathematical formulas to Transact-SQL expressions, Table 8-4 shows a number of mathematical symbols together with their corresponding Transact-SQL functions.

Table 8-4. Mathematical symbols related to Transact-SQL functions

Symbol

Formula

Description

|a|

ABS(a)

Absolute value of a

ea

EXP(a)

Exponential value of a

a2

SQUARE(a)

Square of a

an

POWER(a,n)

nth power of a

n

COUNT(*)

Sample size

figs/sqrta.gif

SQRT(a)

Square root of a

figs/suma.gif

SUM(a)

Sum of all cases in a sample

figs/avga1.gif or figs/avga2.gif

AVG(a)

Average of all cases in a sample — the mean

s

STDDEV(x)

Standard deviation of a sample

figs/stddevpx.gif

STDDEVP(x)

Standard deviation of a population

s2

VAR(x)

Variance of a sample

figs/varpx.gif

VARP(X)

Variance of a population

To write the query shown in this recipe, take the formula for correlation coefficient shown earlier in this chapter, and use the Table 8-4 to translate the mathematical symbols in that formula into SQL functions for your query.

    Team LiB   Previous Section   Next Section