Linear Regression

What it’s for:

Linear regression is a parametric test which looks for a relationship between two variables, where one variable (Y or dependent variable) is dependent on the other (X or independent variable). The relationship need not be one of simple cause and effect, but we should be confident that while X might influence Y, Y does not influence X (Zar 1996). This relationship may be either positive (as X increases Y also increases) or negative (as X increases, Y decreases).

Regression analysis can also be used to find the slope and intercept of the best-fitting line through the data points. These calculations are not included in the Science Toolkit.

Assumptions/Cautions:

Data must be collected in X-Y pairs.
Each pair of data must be independent of other pairs.
For each value of X, the values of Y are normally distributed (parametric test).
Tests only for a linear relationship between X and Y.
Assumes Y is dependent on X and not vice-versa.
Assumes X values are measured with little or no error.
How to use it:

1) Regression analysis is based on a number of sums, and it is is easiest to begin by carrying out all these sums on your data, as described in the table below.

Sum up all X values.
Sum up all Y values.
Square all X values, and then sum the squared values.
Square all Y values, and then sum the squared values.
Multiply each X-Y pair together, and then sum the products.

2) Calculate the regression sum of squares as shown in the box at right. Notice that the complicated equation becomes fairly simple once the sums are completed. (All formulae from Zar 1996).

3) Calculate the residual sum of squares as shown in the box below. Again, once the sums are calculated, the equation becomes manageable.

4) Calculate degrees of freedom. In linear regression, the regression degrees of freedom (also called the numerator degrees of freedom) is always 1. The residual degrees of freedom (also called the denominator degrees of freedom) is n-2.

5) Calculate regression and residual mean squares (MS), by dividing the appropriate sum of squares (SS) by degrees of freedom. Notice that the regression MS will always be equal to the regression SS, since we are dividing by 1.

6) Finally we can calculate F, our test statistic by dividing the regression MS by the residual MS.

7) Now estimate the p-value using a computer program or table of one-tailed critical values for the F distribution. Note that the regression degrees of freedom is the numerator degrees of freedom, and the residual degrees of freedom is the denominator degrees of freedom.

8) Draw a conclusion based on the p-value from 7). See also Types of Error.

MS Excel Tips:

You can perform linear regression analysis directly in MS Excel using the regression tool (TOOLS DATA ANALYSIS REGRESSION). The numerator degrees of freedom is reported by Excel as Regression df, and the denominator degrees of freedom as Residual df. The p-value is reported as Significance F. Excel also provides the slope (reported with the name of your X variable) and intercept of the regression line.

RETURN