Two Ways to Be Wrong:
Think about a simple experiment testing a null hypothesis (let’s say, that people have no preference for different coloured M&Ms). In reality, the the null hypothesis (H0) is either true or false. We design our experiment and collect some data about preferences, and eventually reach a conclusion. We may conclude that H0 is true or false, depending on the data and how we interpret them. This gives us two possible ways to be right and two possible ways to be wrong (see table).
Reality is:
H0 false
H0 true
We conclude:
H0 false
We are correct
Type I error
H0 true
Type II error
We are correct
Being right is fine however you accomplish it, but we need to be concerned about the two possible types of error we can make. Ideally we want to avoid making any error, but remember that our experimental results don’t give us certainties, only probabilities. If we conclude that H0 is false, and it’s really true, we are making a Type I error. If we conclude that H0 is true, and it is really false, we are making a Type II error. So whatever conclusion we reach, there is some chance of making an error. Worse, if we try to reduce the chance of making a Type I error, we will increase our chance of making a type II error. So what do we do?
Type I Error Gets Priority:
In general scientists are conservative; they try to avoid jumping to conclusions. This means they would rather make a Type II error, and accept a false H0, than make a Type I error, and reject a true H0. This is more intuitive if you think of it in terms of the alternative hypothesis. We would rather reject a true HA (in our example, that people prefer some colours of M&M over others), than jump to a wrong conclusion by accepting a false HA .
Stats Tests Reflect Priority:
Most simple statistical tests are set up to tell us the probability of making a Type I error (this probability is referred to as alpha). The p-value generated by this test estimates this probability. (Remember that the p-value tells us the probability of getting the results we did if the null hypothesis is true.) We are only willing to reject H0 when the probability of making an error is very small. Most scientists arbitrarily use one chance in 20 (p = 0.05) as a cutoff. Results with a low p-value are called significant because they tell us that something more interesting than just random chance appears to be at work.
Of course sometimes we will accept H0 as true when it is really false. Our basic statistical test doesn’t tell us the probability of making this type of error (formally called beta). Tests to look at beta are available. They work by estimating the probability of avoiding a Type II error (1-beta), which is called the power of the test.
Two Simple Analogies:
Most of us find it confusing to keep Type I and Type II errors straight, but a simple analogy can help. Imagine that you as a scientist are the prosecutor in a murder trial. You are trying to put forward evidence to convince the jury that the murderer is guilty — this is your alternative hypothesis. But the jury won’t convict unless you prove your case beyond a reasonable doubt. (Scientists define that reasonable doubt as one chance in 20, something the courts don’t do!) The jury will only jump to the conclusion of guilt (alternative hypothesis) in the case of strong evidence against innocence (null hypothesis). Juries would rather set a guilty man free than convict an innocent man. In the same way, science would rather miss out on a new idea that is true, than accept a new idea which is false. (Keep in mind that in science there’s no law against testing the same hypothesis twice, so in some sense the jury is always out on every hypothesis.)
Don’t think like a lawyer? Well imagine you are testing people for an infectious disease. You can conclude that they have the disease, or that they do not have the disease. Either diagnosis can be correct, or incorrect. So there are two ways to make an error. You could conclude that a healthy person was actually sick. Using a null hypothesis of no sickness, this would be a Type I error. But you might also conclude that a sick person was healthy. That would be a Type II error.
Power Can Be Critical:
As the second analogy above illustrates, there are some circumstances in which scientists’ bias toward avoiding Type I errors doesn’t work as well. If we are testing a new drug which could save lives, we need to be just as concerned about concluding the drug is useless when in reality it works (Type II error) as we do about concluding the drug works when in reality it’s useless (Type I error). In these circumstances power tests must be carried out along with basic statistical tests.