Background | Descriptive Statistics | Inferential Statistics | Power and Sample Size | Exercises

This chapter considers the comparison of a continuous outcomes from two independent groups.

*Illustrative data *`WCGS.ZIP` (Selvin, 1991, p. 41). To illustrate techniques, we consider cholesterol levels (mg/dl) in Type A and Type
B men. Data are:

Type A: 233, 291, 312, 250, 246, 197, 268, 224, 239, 239, 254, 276, 234, 181, 248, 252, 202, 218, 212, 325

Type B: 344, 185, 263, 246, 224, 212, 188, 250, 148, 169, 226, 175, 242, 252, 153, 183, 137, 202, 194, 213 .

Data are structured as a numeric dependent (outcome) variable and a dichotomous independent (group) variable as `CHOL` and
`BEHAVIOR`, respectively. The first three records and last record of this data set look like this:

`REC CHOL BEHAVIOR
--- ---- --------
1 233 A
2 291 A
3 312 A
etc.
40 213 B`

Descriptive statistics for the two groups are computed with a two variable `MEANS` command applied as follows:

`EPI6> READ <DATASET>
EPI6> MEANS <DV> <IV>`

where `<DV>` represents the name of the dependent variable and `<IV>` represents the name of the independent variable.

For the illustrative data set, the following commands are issued:

`EPI6> READ WCGS
EPI6> MEANS CHOL GROUP`

Five sections of output are produced (a frequency table, summary statistics, ANOVA table, Bartlett's test, Kruskal-Wallis test). Summary statistics are printed below the frequency table. For the illustrative data, the summary statistics are:

` MEANS of CHOL for each category of BEHAVIOR`

`BEHAVIOR Obs Total Mean Variance Std Dev
A 20`

B

Difference

`BEHAVIOR Minimum 25%ile Median 75%ile Maximum Mode
A 181.000 221.000 242.500 261.000 325.000 239.000
B 137.000 179.000 207.000 244.000 344.000 137.000`

Thus, *n*_{1} = 20 *n*_{2} = 20 (listed under `Obs`.) and the type A men in the sample have higher mean scores than type B men (245.1 vs.
210.3). In addition, the type A group had less variability than the type B men (standard deviations: 36.6 vs. 48.3).

The observed mean difference (34.750 in this instance) is the point estimate
of expected mean difference µ_{1}-µ_{2}.
To calculate a 95% confidence interval for µ_{1}-µ_{2}, first
calculate (by hand) the standard error of the mean difference as follows:

*se* = SQRT[(`MSW`)(1/*n*_{1} + 1/*n*_{2})]

where `MSW` is the Mean Square Within as reported in *Epi Info*'s ANOVA table:

` ANOVA
Variation SS df MS F statistic p-value t-value
Between 12075.625 1 12075.625 6.564 0.013853 2.562113
Within 69903.150 38 `

Total 81978.775 39

For the illustrative data, *se* = SQRT[(1839.557)(1/20 + 1/20)] = 13.56 mg/dl.

A 95% confidence interval for µ_{1} - µ_{2 } is given by:

(mean difference) ± (*t _{n}*

where (mean difference) = mean_{1} - mean_{2}, *t _{n}*

*Epi Info* calculates the equal variance independent *t *test for *H*_{0}: µ_{1} = µ_{2} in its ANOVA table:

`Variation SS df MS F statistic p-value t-value
Between 12075.625 1 12075.625 6.564 0.013853 2.562113`

Within 69903.150

Total 81978.775 39

Thus, data demonstrate *t*_{stat} = 2.56 with 38 degrees of freedom (*p* = .014).
Most investigators would consider this "significant" evidence against *H*_{0}.

The above confidence interval and test statistics assume data are (1) free of bias (information bias, selection bias, and confounding),
(2) groups and individuals within groups are independent, (3) the sampling distribution of the mean difference is
normal, and (4) variances in the two populations
are equal (*homoscedasticity*). Although violation of assumptions (3) and
(4) may results, numerous studies have
shown that these methods allow for considerable departures from normality and equal variance
while still providing stable results. The
robustness of these last two assumptions is good when samples sizes are equal (*n*_{1} = *n*_{2}),
samples are large (*n* > 30), and a
two-sided test is used (Zar, 1996, p. 128). *Furthermore*, statistical tests need not be realistic in order to be useful.

Statistical models are sometimes misunderstood in epidemiology. Statistical models arenever true. The question of whether a model is true is irrelevant. A more appropriate question is whether we obtain the correct scientific conclusion if we pretend that the process under study behaves according to a particular statistical model. (Zeger 1991).

The Mann-Whitney / Kruskal-Wallis test (for two sample) are non-parametric analogues of the independent *t* test. *Epi Info *computes
the Kruskal-Wallis test as part of its `MEANS` command. Here are the results for the illustrative data:

`Mann-Whitney or Wilcoxon Two-Sample Test (Kruskal-Wallis test for two groups)
Kruskal-Wallis H (equivalent to Chi square) = 6.333
Degrees of freedom = 1
p value = 0.011853
`

Thus, c^{2}_{stat} = 6.33 with 1 degree of freedom (*p *= 0.012).

: The Kruskal-Wallis procedure is slightly less powerful than the independentCommentttest when data come from normally distributed populations. The loss of efficiency is surprising small when the test is used in non-normal populations.

When addressing two samples, Bartlett's test addresses *H*_{0}: s²_{1} = s²_{2}, where s²* _{i}* represents the variance in population

` Bartlett's test for homogeneity of variance
Bartlett's chi square = 1.404 deg freedom = 1 p-value = 0.236005`

Thus, c^{2} = 1.40, df = 2, *p* = .24. This provides little or no support for rejecting *H*_{0}.

Bartlett's test is reliable only when used in normal populations. When the population distribution is platykurtic, the trueComment:pvalue is less than the calculatedpvalue, i.e., the test is conservative (Maurais & Ouimet, 1986). When population distribution is leptokurtic, the truepvalue is greater than calculatedpvalue, i.e., the test is liberal . Becausettests are relatively reliable in the face of unequal variance, many statisticians question the use of Bartlett's test as a prequel test to the independentttest. Consider:

It has been shown that in the commonly occurring case in which group sizes are equal, or not very different, the [independentttest] is affected surprisingly little by variance inequalities. Since this test is also known to be very insensitive to non-normality it would be best to accept the fact that it can be used safely under most practical conditions. To make the preliminary test on variances is rather like putting to sea in a row boat to find out whether conditions are sufficiently calm for an ocean liner to leave port! (Box, 1953)

To achieve 80% power for a = 0.05 (two-sided), *each *group should have:

*n* = (16 s² / *d*²) + 1

where *d* = a "difference worth detecting" and s = a good estimate of within-group standard deviation (e.g., *s _{p}*). Suppose we want to
detect a difference of 25 units and assume the standard deviation of the outcome variable is 45. Then, the required sample size per
group,

Power is the probability of achieving a "significant" result under a given set of assumptions, assuming *H*_{0} is false. For example, we
might ask "What is the probability of achieving statistical significance at a = .05 (two-sided) assuming µ_{1} = 50, µ_{2} = 40, s = 45, and *n*_{1}
= *n*_{2} = 20. The answer to this is ".10," meaning the test had only a 10% of rejecting the incorrect alternative hypothesis. Try using the
Web power calculator located at http://www.health.ucalgary.ca/~rollin/stats/ssize/n2.html to calculate power for the type of problem
presented in this chapter.

**(1) TWOGRPS.ZIP. ***Scores from Two Groups*. Two groups demonstrate the following scores on a psychological profile test:

Group 1: 86, 99, 96, 95, 72, 73, 95, 125, 97, 95

Group 2: 110, 126, 89, 106, 98, 105, 93, 127, 130, 92

Computerize these data remembering to create separate variables for `SCORE` and `GROUP` and, then, compute the descriptive and
inferential statistics described in this chapter. Report on your findings using plain language.

**(2) FEV.ZIP** (Rosner, 1990, p 40; Tager et al., 1985). Data are from a respiratory health survey of children and adolescents. Codes in
the file are as follows:

Variable |
Type |
Len |
Description |

ID | Integer | 5 | Identification number |

AGE | Integer | 2 | Age of participant at beginning of the study (years) |

FEV | Real (#.####) | 6 | Forced expiratory volume (liters/second) |

HEIGHT | Real (##.#) | 4 | Height (inches) |

SEX | Integer | 1 | Sex: 0 = female, 1 = male |

SMOKE | Integer | 1 | Current smoking status: 0 = non-smoker, 1 = smoker |

Compare the smokers and non-smokers in this file with respect to their age.