Bonferroni's Adjustment and the Hypothesis Testing Framework

Consider the hypothesis testing framework in which we generically test:

H0: no difference in the population vs. H1: H0 is false

We then assume H0 is true until data make this assumption unlikely. Our threshold of unlikeliness is called the alpha level of our test. We will reject H0 when P(data | H0 true) < alpha. Our alpha threshold may be set to any level, but by convention is usually set to .05 or .01. Let us adopt an alpha = .05 threshold for the current discussion.

In setting alpha to .05, we declare a willingness to falsely reject 1-in-20 correct null hypotheses, so we expect 1 false rejection, on average, over the long run. In performing multiple tests, therefore, the likelihood of at least one false rejection expands. To demonstrate this effect, let us assume we are testing multiple, correct null hypotheses. By letting alpha = .05, we will correctly retain .95 of the null hypotheses. In testing 3 correct null hypotheses, the probability we will make all three correct decision is .953 = .857. The probability of making at least one false rejection is, therefore, 1 - .953 = .143.

In testing 3 groups, there are 3 possible multiple comparisons (Group 1 vs. Group 2, Group 1 vs. Group 3, and Group 2 vs. Group 3). In testing 4 groups there are 6 possible comparisons. In testing 5 groups there are 10 such comparisons, and in testing 6 groups there are 15 possible comparisons. In testing each comparison at alpha = .05 we see:
No. of groups No. of possible comparisons Probability of at least one false rejection if alpha = .05 for each comparison (so-called family-wise error rate)
3 3 1 - .953 = .143
4 6 1 - .956 = .265
5 10 1 - .9510 = .401
6 15 1 - .9515 = .537

But if we down-adjust the alpha level for each possible comparisons to alphaBonf = alpha/(no. of comparisons), we can maintain a family-wise error rate of alpha. This is called Bonferroni's adjustment. For example, in testing 3 groups at alphaBonf = .05 / 3 = .0167. The probability of each correct decision is now = 1 - .0167 = .9833, and the family-wise error rate during 3 comparisons = 1 - .98333 ~= .05. We havew thereby maintained a reasonable type I (alpha) error rate, and can have confidence in any discovered significant differences.