8.8 Calculating Correlation8.8.1 ProblemYou want to calculate the correlation between two samples. For example, you want to calculate how similar the light-bulb sales patterns are for two different years. 8.8.2 SolutionThe query in the following example uses the formula for calculating correlation coefficients shown earlier in this chapter. It does this for the years 1997 and 1998. SELECT (COUNT(*)*SUM(x.Sales*y.Sales)-SUM(x.Sales)*SUM(y.Sales))/( SQRT(COUNT(*)*SUM(SQUARE(x.Sales))-SQUARE(SUM(x.Sales)))* SQRT(COUNT(*)*SUM(SQUARE(y.Sales))-SQUARE(SUM(y.Sales)))) correlation FROM BulbSales x JOIN BulbSales y ON x.month=y.month WHERE x.Year=1997 AND y.Year=1998 correlation ----------------------------------------------------- 0.79 The correlation calculated is 0.79, which means that the sales patterns between the two years are highly correlated or, in other words, very similar. 8.8.3 DiscussionThe solution query implements the formula shown earlier in the chapter for calculating a correlation coefficient. The solution query shown here is an example of how you implement a complex formula directly in SQL. To aid you in making such translations from mathematical formulas to Transact-SQL expressions, Table 8-4 shows a number of mathematical symbols together with their corresponding Transact-SQL functions.
To write the query shown in this recipe, take the formula for correlation coefficient shown earlier in this chapter, and use the Table 8-4 to translate the mathematical symbols in that formula into SQL functions for your query. |