Pearson Correlation

What it’s for:

The Pearson correlation coefficient (r) tests for a linear correlation between two normally distributed variables (parametric test)*. A perfect linear relationship between the two variables would result in r = 1 (or r = -1 for a negative correlation), and no relationship between the two variables results in r = 0. The Pearson correlation coefficient is also known as simple correlation coefficient, or Pearson product-moment correlation coefficient.

The correlation coefficient has an additional property which is useful to researchers. By squaring r, we can calculate the coefficient of determination. This r2 tells us how much of the variation in Y is explained by X, or vice-versa, and allows us to compare the strengths of different correlation. A correlation of r = 0.7 (r2 = 0.49) is about twice as strong as a correlation of r = 0.5 (r2 = 0.25).

Assumptions/Cautions:

Data must be collected in pairs (i.e. both variables must be measured at the same time and place), and each pair of data must be independent.
Data must be normally distributed for each variable (parametric test)*.
Looks only for a linear relationship between the two variables.
Does not imply a cause and effect relationship.
How to use it:

1) Calculate the sum of your X and Y values. (It makes no difference which variable you call X, and which variable you call Y.)

2) Square all your X and Y values, and then calculate the sum of these squares.

3) For each pair of data, multiply X * Y. Calculate the sum of these XY values.

4) Now you can calculate the test statistic r as shown in the box at right (Zar 1996). The formula looks complicated, but you have already done most of the work in steps 1-3).

5) Calculate degrees of freedom as n-2 (Zar 1996).

6) Estimate the p-value using a computer program or a table of critical values. The significance of correlation coefficients can be estimated by converting r to t, using the formula in the box at right (Zar 1996).

MS Excel Tips:

You can calculate the Pearson correlation coefficient directly in Excel by using the built-in CORREL or PEARSON functions, or by looking under TOOLS — DATA ANALYSIS — Correlation. However Excel will not provide the p-value associated with the r statistic. You will still need to use a table of critical values to estimate this.

* Actually it’s a bit more complicated. For each value of X, the values of Y should be normally distributed. For each value of Y, the X values should be normally distributed (bivariate normal distribution). But this is difficult to test directly without a very large sample size.

RETURN