Inferential Statistics

Introduction:

Inferential statistics give us a rigourous and objective way to determine whether data are consistent with a hypothesis. Since disproving things is easier than proving them, statistical tests are set up to disprove a null hypothesis (H0) rather than proving the alternative hypothesis (HA).
We can’t even disprove our null hypothesis with 100% certainty. All our statistical test can show us is that our results are improbable under the null hypothesis. If the probability is small enough (one chance in 20 is the common criterion) we conclude that H0 is wrong. This provides some support for the alternative hypothesis, but of course it doesn’t constitute proof.

Why do we need stats?

A well-designed experiment will control all the confounding variables that might affect the result, but there’s one factor that can’t be controlled — random chance. Imagine we are designing a simple experiment to test whether a coin is fair (balanced perfectly so that heads and tails are equally likely). We set up a carefully controlled experiment to make sure that the way we flip the coin, and the surface it lands on, don’t affect the results. Now we flip the coin 20 times and get 15 heads and only five tails. If the coin was fair, we would expect about 10 heads and 10 tails. But random chance affects the outcome of any coin flip. By chance alone we might get a few more heads than tails with a fair coin. We need some way of deciding how likely our results are based only on random chance. Inferential statistics allow us to do that. It turns out that the probability of getting at least 15 heads in 20 coin tosses, with a fair coin, is about one in 40 (0.0253 to be more precise).
Researchers face similar problems all the time because random measuring errors and natural variation influence every set of measurements we take. If we’re comparing the means of two treatment groups, we need to know how much confidence to have in each mean, and how different the means are. It comes down to comparing variation within a group (the less variation within each treatment group, the more confidence we will have in the means) and variation between groups (which will make the difference between the means larger). This topic was also discussed under Replication. More replicates within each treatment group will give us better estimates of the means, and we will be better able to see if there is a difference between groups (we call this increasing the power of our test).
Use of inferential statistics is discussed under two topic headings. The first looks at What test to use, and the second examines Types of errors. A formal introduction to the mathematics of statistical tests is beyond the scope of the Science Toolkit, but for a brief introduction to how they work, look here.