Background | Test for Interaction | Mantel-Haenszel Methods

When Interaction and Confounding Are Minimal | Strategy for Analysis | Exercises

This chapter considers the analysis of a binary outcome (disease D) and binary exposure (exposure E) with data stratified according to an extraneous cofactor (cofactor C). Two phenomena -- confounding and statistical interaction -- are considered.

Confounding (from the Latin *confundere*: to mix together) is a distortion of an association between
E and D
brought about by a cofactor C. Confounding occurs when E is associated with C *and *C is an independent risk
factor for D. In addition, C is *not *intermediate in the causal
pathway.

For example, smoking (C) confounds the relation between alcohol consumption (E) and lung cancer (D) because alcohol user are more likely to smoke than non-users. Thus, the effects of smoking get mixed-in with the effect of alcohol consumption--smoking confounds the association between alcohol consumption and lung cancer.

One way to address confounding is to subset data into relatively homogenous subgroups ("strata") according to the confounding cofactor. Not surprisingly, data can show one thing in aggregate form and another once disaggregated.

Measures of association in the aggregate are called **crude measures of association** since relations
are unadjusted. Let us precede symbols for measures of association with a *c* when referring to crude measures of association. For
example, *cRR* will represent the crude risk ratio (i.e., the risk ratio based on all data combined in single 2-by-2 table).

Subscripts will
denote **strata-specific measures of association**. For example, *RR*_{1} will represent the risk ratio in stratum 1, *RR*_{2} will represent the risk
ratio in stratum 2, and so on.

Suppose, in the aggregate, we see the following** ** crude data:

` D+ D-
E+ 200 800 1000
E- 50 950 1000
250 1750 2000`

Therefore, *p*_{1} = 200 / 1000 = .20, *p*_{2} = 50 / 1000 = .05, and *cRR* = .20 / .05 = 4.0.

Now suppose we stratify by confounding factor C. In** ** strata 1
(positive for C) and find:

` D+ D-
E+ 194 606 800
E- 24 76 100
218 682 900`

In this strata, *p*_{1,1} = 194 / 800 = .2425, *p*_{2,1} = 24 / 100 = .24, and * RR*_{2} = .2425 / .24 @ 1.0.

In strata 2 (negative for factor C) we find:

` D+ D-
E+ 6 194 200
E- 26 874 900
32 1068 1100 `

In this strata, *p*_{1,2} = 6 / 200 = .03, *p*_{2,2} = 26 / 900 = .0288, and * RR*_{2} = .03 / .0288 @ 1.0.

Therefore, the strong positive association seen in the aggregate disappears in the subgroups. This proves C confounded the association between E and D in the aggregate.

The term "interaction" has two distinct meanings in epidemiology. Biological interaction is the interdependent operation of two or
more factors in a cause. There is always biological interaction in epidemiologic data. Statistical interaction is when the *statistical
model *being used does not* *explain the joint effects of two or more independent variables. Biological interaction and statistical
interaction are two distinct phenomena that should not be confused. Here, we consider statistical interaction *only*.

**Statistical interaction** is synonymous with **effect-measure heterogeneity**. In epidemiology, this occurs when the value for the
effect-measures being used (e.g., risk ratio) is differs in different subgroups. A numerical example will serve to illuminate.

Once again we may start with the **crude (unstratified) data**:

` D+ D-
E+ 200 800 1000
E- 50 950 1000
250 1750 2000`

Again, *p*_{1} = 200 / 1000 = .20, *p*_{2} = 50 / 1000 = .05, and *cRR* = .20 / .05 = 4.0.

Suppose, on stratification, we find:

**Stratum 1 (negative for C)**

` D+ D-
E+ 12 188 200
E- 48 752 800
60 940 1000`

Therefore, *p*_{1,1} = 12 / 200 = .06, *p*_{2,1} = 48 / 800 = .06, and *RR*_{1} = .06 / .06 = 1.0.

**Stratum 2 (positive for C)**

` D+ D-
E+ 188 612 800
E- 2 198 200
190 810 1000`

Therefore, *p*_{1,2} = 188 / 800 = .2350, *p*_{2,2} = 2 / 200 = .01, and *RR*_{2} = .235 / .01 = 23.5.

Because the risk ratio is heterogeneous in the two strata, we say there is a statistical interaction between E and C as relates to D.

The above demonstrations suggest a strategy for dealing with extraneous factors. In essence, data are explored through stratification.

To illustrate methods in this chapter, let us consider a data set that demonstrates both interaction and confounding. Data were collected as part of a University of California at Berkeley study to assess whether men were being given preferential treatment over women in admission to graduate programs (Bickel & O'Connell, 1975, Freedman et al., 1991, pp. 16 - 19). Assuming that the men and women who applied for admission to the graduate programs were equally well-qualified, one would expect equal acceptance rates by gender. However, it initially appeared as if men were being admitted in greater proportions than women. Hence, the investigation.

The experience of applicants to the six largest majors at the school is stored in ` SEXBIAS.ZIP`. This data set contains 4526 records
and the following variables:

Variable |
Type |
Len |
Description |

MAJOR |
Alpha | 9 | Department major: A, B, C, D, E, and F |

SEX |
Alpha | 9 | 1 = Male 2 = Female |

ACCEPT |
Yes/no | 1 | Application accepted: +/- |

Crude analysis (`TABLES SEX ACCEPT``) derives:`

` ACCEPT`

`SEX + - | Total`

`-----------+---------------+------`

` 1 | 1198 1493 | 2691 Acceptance rate, men = 1198 /2691 = 0.445`

` 2 | 557 1278 | 1835 Acceptance rate, women = 557 / 1835 = 0.304`

`-----------+---------------+------ RR = 0.445 / 0.304 = 1.46`

` Total | 1755 2771 | 4526 P < 0.00001`

Therefore, men appear to have a higher acceptance rate than women ( supporting evidence of preferential treatment). However, what if
men had applied to majors with more favorable acceptance rates than women? Then the cofactor of `MAJOR` would confound the
observed relation. To investigate this possibility, data are stratified by `MAJOR.`

Table stratification is accomplished with the command:

`EPI6> TABLE <E> <D> <C>`` `

For the illustrative example, the following command is issued:

`EPI6> TABLES SEX ACCEPT MAJOR`

This produces separate tables for each of the 6 majors. *Annotated* output is shown below:

` MAJOR =A`

` ACCEPT`

`SEX | + - | Total`

`------------------------------`

` 1 | 512 313 | 825 Acceptance rate, men = 512 / 825 = 0.621`

` 2 | 89 19 | 108 Acceptance rate, women = 89 / 108 = 0.824`

`-----------+-------------+------ RR = 0.621 / 0.824 = 0.75`

` Total | 601 332 | 933 p = 0.000033`

` MAJOR =B`

` ACCEPT`

`SEX | + - | Total`

`-----------+-------------+------`

` 1 | 353 207 | 560 Acceptance rate, men = 353 / 560 = 0.630`

` 2 | 17 8 | 25 Acceptance rate, women = 17 / 25 = 0.680`

`-----------+-------------+------ RR = 0.630 / 0.680 = 0.93`

` Total | 370 215 | 585 p = 0.61`

` MAJOR =C`

` ACCEPT`

`SEX | + - | Total`

`-----------+-------------+------`

` 1 | 120 205 | 325 Acceptance rate, men = 120 / 325 = 0.369`

` 2 | 202 391 | 593 Acceptance rate, women = 202 / 593 = 0.341`

`-----------+-------------+------ RR = 0.369 / 0.341 = 1.08`

` Total | 322 596 | 918 p = 0.39`

` MAJOR =D`

` ACCEPT`

`SEX | + - | Total`

`-----------+-------------+------`

` 1 | 138 279 | 417 Acceptance rate, men = 138 / 417 = 0.331`

` 2 | 131 244 | 375 Acceptance rate, women = 131 / 375 = 0.349`

`-----------+-------------+------ RR = 0.331 / 0.349 = 0.95`

` Total | 269 523 | 792 p = 0.59`

` MAJOR =E`

` ACCEPT`

`SEX | + - | Total`

`-----------+-------------+------`

` 1 | 53 138 | 191 Acceptance rate, men = 53 / 191 = 0.277`

` 2 | 94 299 | 393 Acceptance rate, women = 94 / 393 = 0.239`

`-----------+-------------+------ RR = 0.277 / 0.239 = 1.16`

` Total | 147 437 | 584 p = 0.32`

` MAJOR =F`

` ACCEPT`

`SEX | + - | Total`

`--------------------------------`

` 1 | 22 351 | 373 Acceptance rate, men = 22 / 373 = 0.059`

` 2 | 24 317 | 341 Acceptance rate, women = 24 / 341 = 0.070`

`-----------+-------------+------ RR = 0.059 / 0.070 = 0.84`

` Total | 46 668 | 714 p = 0.54`

Therefore, only Major A demonstrates a significant difference in acceptance rates by sex -- and this in favor of women by a small
margin. Notice that the initial crude analysis hid this pattern (a.k.a., Simpson's paradox). It is now evident that application to specific
`MAJOR`s confounds the study of `SEX` and `ACCEPT`ance rates *and *there is an interaction between `SEX` and `MAJOR`.

A chi-square test for interaction may be used to help whether effect-measure heterogeneity is present. Because this test applies to both
risk ratios and odds ratios (and other measures of association), let *MA* refer to the measure of association parameter being studied. The
null and alternative hypotheses are:

*H*_{0}: *MA*_{1} = *MA*_{2} = . . . = *MA*_{S} (no interaction)

*H*_{1}: at least one of the strata-specific measure of association differs (interaction)

The method of calculating the chi-square interaction statistic in *Epi Info *is unspecified, but it is assumed to be a general Wald statistic
(see *Epidemiology Kept Simple* Formula 15.1). Under the null hypothesis, this chi-squared interaction statistic has *S *- 1 degrees of
freedom, where *S *represents the number of strata being tested.

** Illustrative example. **In

`Chi Square for evaluation of interaction 18.10`

`P value 0.00282859`

Since there are 6 strata, *df* = 5. This along with the divergent incidence (risk) ratio in strata 1 suggests that statistical interaction is
present.

It is often advantageous to summarize the relation being studied with a single, unconfounded measure of association and tests. This can be accomplished by pooling unconfounded strata-specific measures of association to form a summary measure of association.

The Mantel-Haenszel method of pooling calculated as weighted average of strata-specific estimates with weights proportional to
*N*_{1}**N*_{2}/*N*, where *N *represents the total number of people in the strata (Cochran 1954; Mantel & Haenszel 1959). This assumes the
measures of association are uniform among strata. This homogeneity assumption allows us to combine strata-specific measures of
association to form a single summary measure that has been adjusted for confounding. Any non-uniformity will be suppressed
nonuniformity through summarization. The pooled measure of association may be viewed as a statistical convenience whose purpose
is to draw correct conclusions about the effect of the exposure.

** Illustrative Example (SEXBIAS.REC). **By suppressing the non-uniformity of the incidence (risk) ratios in

` SUMMARY RISK RATIO (RR)`

`Crude RR without stratification 1.47`

`Summary RR of (ACCEPT=+) for (SEX=1) 0.94
95% confidence limits for RR 0.87 < RR < 1.03`

Comments:

(1) The crudeRRestimate of 1.47 indicates higher acceptance for men, whereas the summary estimate of 0.94 indicates slightly higher acceptance rates in women. Thus, the summaryRRis an unconfounded estimate of the effect of gender on acceptance to graduate school at UC Berkeley.

(2) The 95% confidence interval for the summaryRRis calculated using the method in Robins et al., 1986.

A test of *H*_{0}: *aMA* = 1 (where *aMA* represents the parameter for the Mantel-Haenszel adjusted measure of association) is performed
with a Mantel-Haenszel chi-square statistic. Under the null hypothesis, this test statistic has a chi-square sampling distribution with 1
degree of freedom.

** Illustrative Example (SEXBIAS.REC).** The null hypothesis

` ** Summary of 6 Tables With Non-Zero margins **`

` N = 4526`

`Mantel-Haenszel Summary Chi Square 1.43`

`P value 0.23226346`

TheComment:pvalue of .23 fails to provide evidence againstH_{0}. We conclude no significant difference in acceptance rates by gender.

In the absence of interaction and confounding, stratification and adjustments are unnecessary. In such instances, crude measures of association offer the benefit of better precision (compared with M-H summary measures of association).

** Illustrative Example. **Data from a case-control study of esophageal cancer and tobacco consumption (Breslow & Day, 1980; Tuyns,
1977) are available in

`EPI6> READ BDNEW`

`EPI6> TABLES TOBHIGH CASE ALCHIGH`

Key output includes:

`Stratum 1 (ALCHIGH = 1)`` `

` CASE`

`TOBHIGH | 1 2 | Total`

`-----------+-------------+------`

`1 | 30 23 | 53`

`2 | 66 86 | 152`

`-----------+-------------+------`

` Total | 96 109 | 205`

`Single Table Analysis Stratum 1 Odds ratio = 1.70`

`Stratum 2 (ALCHIGH = 2)`` `

` CASE`

`TOBHIGH | 1 2 | Total`

`-----------+-------------+------`

`1 | 34 127 | 161`

`2 | 70 539 | 609`

`-----------+-------------+------`

` Total | 104 666 | 770`

`Single Table Analysis Stratum 2 Odds ratio = 2.06`

Thus, the strata-specific odds ratios are 1.70 and 2.06, respectively. We might now ask if it makes sense to summarize these two odds
ratio with a single summary statistic. The chi-square interaction statistic (*H*_{0}: *OR*_{1} = *OR*_{2}) is helpful in this regard. Epi Info prints this
information in the area labeled "Summary Odds Ratio":

`Chi Square for evaluation of interaction 0.24`

`P value 0.62621898`

In this instance *df* = 2 - 1 (not shown by *Epi Info*) and c�_{int} = 0.24, *p *= 0.63. This supports an assumption that differences in
strata-specific odds ratios may be random (no statistical interaction).

The crude odds ratio and M-H summary odds ratio also listed in the area labeled "Summary Odds Ratio":

` SUMMARY ODDS RATIO
Crude OR 1.96
Mantel-Haenszel weighted Odds ratio 1.92 `

We also note that the crude odds ratio and Mantel-Haenszel weighted odds are similar. Therefore, it is reasonable to report the crude
odds ratio. To get the confidence interval and *p* value for the crude odds ratio issue the command. For example,

`EPI6> TABLES TOBHIGH CASE`

Output is:

`TOBHIGH | 1 2 | Total`

`-----------+-------------+------`

`1 | 64 150 | 214`

`2 | 136 625 | 761`

`-----------+-------------+------`

` Total | 200 775 | 975`

`Odds ratio 1.96
Cornfield 95% confidence limits for OR 1.36 < OR < 2.82`

Although the detection and control of confounding is crucially important in epidemiologic research, there exists no single way for dealing with this problem. Nevertheless, epidemiologists agree that potential confounders must be identified before data are collected so that data on these factors can be collected to allow further evaluation. So how does one know what variables might confound an analysis? Briefly, this information comes from an understanding of the systems being investigated, and is based on previous research, clinical insight, and understanding of the processes being studied. It is essential that the investigator "does their homework," researching all potential confounders, before collecting data. With this said, a couple of rules-of-thumb are presented:

(1) Adjustments for confounding are contraindicated when interaction is present, as such summary measures of association would obscure important modifications of effect.

(2) Since confounding is a matter of systematic error (*not* random error), hypothesis tests should not be used in the detection of
confounding.

(3) A pragmatic strategy for calculating good measures of association suggests:

- Before the study is begun, the investigator attempts to understand the complex causal interrelations among the exposure, disease,
and various other factors. This may require lots of homework on the part of the investigation, as well as close collaboration with

subject matter specialists. - Measurements and coding for E, D, and C
_{1}, C_{2}, ..., Cmust be valid based on understanding of phenomena._{k} - The research question must be defined in an insightful way. "Finding the question is often more important than finding the answer" (Tukey, 1980).
- Study design are based on choices that maximize the likelihood of delineating causal relations.
- After data are collected, entered and cleaned, the analyst explores inter-relations, starting with simple comparisons and descriptions. Identified relationships between E and C and C and D heighten the awareness of the potential for confounding.
- Data are stratified and explored for interaction. (The above test for interaction may be applied.) When interaction is confirmed, strata-specific estimates are reported.
- The continued consultation with a subject matter specialist may be necessary before a decision is made whether or not to control for potential confounder C.
- In the absence of interaction and confounding, crude (unadjusted) estimates of association may be reported.
- The best estimate of association is both valid and precise. If interaction is present, strata-specific measures of association are
reported. If interaction is absent but confounding is present, summary (adjusted) measures of association are reported. If neither
interaction nor confounding are present, crude (unadjusted) measures of association are reported.
- In practice, there will always be uncertainty about whether a given set of variables are or are not confounders. "Science DOES NOT BEGIN WITH A TIDY QUESTION. Nor does it end with a tidy answer" (Tukey, 1980).

**(1) GENERIC.ZIP**:* Simpson's Paradox *(Hypothetical Data). This exercise illustrates Simpson's Paradox while applying a strategy for
the detecting and accounting for confounding and interaction. Three case-control data sets are presented: GENERIC1.REC,
GENERIC2.REC, and GENERIC3.REC. Each data set contains the variables E (exposure), D (disease), and C (potential confounder).
For each data set determine if interaction is present. If interaction is present, stop there and report strata-specific odds ratios and other
relevant case-control statistics. If interaction is absent, assess the potential for confounding. Summarize your assessment. If
confounding is present, report an adjusted odds ratio and associated case-control statistics. If interaction and confounding are absent,
report the crude (unadjusted) case-control statistics.

**(2) BD2.ZIP:** *Breslow & Day 2: The Oxford Childhood Cancer Survey* (Breslow & Day, 1980, p. 238; Kneale, 1971; Steward &
Kneale, 1970). Data are from a case-control study of childhood leukemia and lymphoid tumors and in utero X-ray exposure (Kneale et
al., 1971). The primary variables of interest are `CASE` (1 = case, 2 = control), `XRAY (`1 = exposed, 2 = unexposed). The potential
confounder is `AGE` (years). Analyze these data and report the "best" odds ratio estimate and a 95% confidence interval for the
parameter. Summarize your results in narrative form.

**(3) BI-HELM1.ZIP:** *Bicycle Helmet Use in Two Northern California Counties* (Perales et al., 1994). This data set contains
information on bicycle helmet use in Santa Clara County and Contra Costa County -- two counties in northern California (U.S.A.).
Data definitions are included in a data documentation file in the ZIP archive (bi-helm1-dd.htm), which can be downloaded by clicking
on the highlighted text, above. Review this data documentation file and then perform the following analyses.

(A) Determine crude incidences of helmet use in Santa Clara County (*p*_{1}) and Contra Costa County (*p*_{2}). (The easiest way to derive
these statistics is to use a two-variable tables command `TABLES COUNTY HELMETUSE` ). Test whether these proportions differ, and
summarize your results.

(B) Stratify the data on the matching variable (`TABLES COUNTY HELMETUSE MATCHVAR``). Stratify the data based on the
socioeconomic matching variable ``MATCHVAR`. Report strata-specific helmet use rates by school and test whether *within-strata* rates
differ significantly. Summarize your results narratively.

(C) Test the incidence (risk) ratio parameter for interaction Be explicit in listing the null and alternative hypotheses. Report all relevant
test statistics and state your conclusion.

(D) Discuss your findings. In so doing, consider the potential for interaction and confounding. Which schools show higher helmet-use
rates compared with their matched counterpart? etc.

**(4) CERVICAL**: *Cervical Cancer and Smoking* (Nischan et al., 1988; Pagano & Gauvreau, 1993, p. 359). Data from a case-control
study of cervical cancer and smoking are shown below.

Case |
Control | |

Smoke + |
108 | 163 |

Smoke - |
117 | 268 |

(A) Based on these data calculate the odds ratio of smoking for cervical cancer.

(B) Data stratified by number of sexual partners are shown below. Calculate stratum specific odds ratios.

Stratum 1: Zero or One Partner | ||

Case |
Control | |

Smoke + |
12 | 21 |

Smoke - |
25 | 118 |

Stratum 2: Two or More Partners | ||

Case |
Control | |

Smoke + |
96 | 142 |

Smoke - |
92 | 150 |

(C) Based on these exploratory analyses, would you say there is interaction? Justify your response. How would you report your results?

(5) **ASBESTOS.ZIP**: *Asbestos Exposure and Lung Cancer* (Hypothetical data). Data are from an case-control study of lung cancer
and asbestos exposure. The data set includes information on smoking (`SMOKE`: + / -), asbestos exposure (`ASBESTOS`: + / -), and lung
cancer (`LUNGCA`: + / -)

(A) Calculate the odds ratio of lung cancer associated with smoking. Include a 95% confidence interval, and interpret your findings.

(B) Calculate the odds ratio of lung cancer associated with asbestos exposure. Include a 95% confidence interval and interpret your
findings.

(C) An investigator thinks it would be interesting to sort out the inter-relationship between asbestos, smoking, and lung cancer by
looking at the lung cancer risk associated with asbestos in smokers and non-smokers separately. Perform such a stratified analysis. In
so doing, report strata-specific odds ratios. Perform a test for interaction. (Include all hypothesis testing steps.) Is interaction present?
Calculate and report the summary (adjusted) odds ratio. Is confounding evident? Is confounding present? Would it make sense to
report the adjusted odds ratio in light of your findings about interaction? How would you report your results? Report your final results.

Bickel, P. & O'Connell, J. W. (1975). Is there a sex bias in graduate admissions? *Science*, 187, 398 - 404.

Breslow, N. E., & Day, N. E.(1980). *Statistical Methods in Cancer Research. Volume 1--The Analysis of Case-Control Studies*. Lyon:
International Agency for Research on Cancer.

Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. *Biometrics, 10*, 417-451.

Freedman, D., Pisani, R., Purves, R., & Adhikari, A. (1991). *Statistics* (2nd ed.) New York: W. W. Norton.

Gerstman, B. B., Jolson, H., Bauer, M., Cho, P., Livingston, J., & Platt R. (1996). Depression in new users of ß-blockers and selected
anti-hypertensives. *Journal of Clinical Epidemiology*, 49, 809 - 815.

Hirayama, T. (1990). *Life-style and Mortality: a Large Scale Census-based Cohort Study in Japan*. Basel: S. Karger.

Kneale, G. W. (1971). Problems arising in estimating from retrospective survey data the latent period of juvenile cancers initiated by
obstetric radiography. *Biometrics*, 27, 563 - 90.

Kramer, M. S. (1988). *Clinical Epidemiology and Biostatistics*. Berlin: Springer-Verlag.

Lilienfeld, D. E. & Stolley, P. D. (1994).* Foundations of Epidemiology* (3rd ed.). New York: Oxford.

Mandel, E., Bluestone, C. D., Rockette, H. E., Blatter, M. M., Reisinger, K. S., Wucher, F. P., & Harper, J. (1982). Duration of
effusion after antibiotic treatment for acute otitis media: comparison of cefaclor and amoxicillin. *Pediatric Infectious Diseases*, 1, 310
- 316.

Mantel, N., Haenszel, W.. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. *Journal of the
National Cancer Institute,* 22, 719 - 748.

Nishan, P., Ebeling, K, Schindler C. (1988). Smoking and invasive cervical cancer risk: results from a case-control study. *American
Journal of Epidemiology*, 128, 74 - 77.

Pagano, M. & Gauvreau, K. (1993). *Principles of Biostatistics*. Belmont, CA: Duxbury Press.

Perales, D. & Gerstman, B. B. (1995, March). A bi-county comparative study of bicycle helmet knowledge and use by California
elementary school children. *The Ninth Annual California Conference on Childhood Injury Control*, San Diego, CA.

Robins, J., Breslow, N., & Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. *Biometrics, 42*, 311-323.

Rosner, B. (1990). *Fundamentals of Biostatistics* (3rd ed.) Boston: PWS - Kent Publishing.

Rothman, K. J. (1975). A pictorial representation of confounding in epidemiologic studies. *Journal of Chronic Diseases*, 28, 101 -
108.

Stewart, A. & Kneale, G. W. (1970). Age-distribution of cancers caused by obstetric X-rays and their relevance to cancer latent
periods. *Lancet*, *ii*, 4 - 8.

Tuyns, A. J., Péquignot, G., & Jensen, O. M.. (1977). Le cancer de l'oesophage en Ille-et Vilaine en function des niveaux de
consommation d'alcool et de tabac. Des risques qui se multiplient. *Bull Cancer*, 64, 45 - 60.