Binary Outcome, Stratified Analysis

Background | Test for Interaction | Mantel-Haenszel Methods
When Interaction and Confounding Are Minimal | Strategy for Analysis | Exercises

Background

This chapter considers the analysis of a binary outcome (disease D) and binary exposure (exposure E) with data stratified according to an extraneous cofactor (cofactor C). Two phenomena -- confounding and statistical interaction -- are considered.

Confounding

Confounding (from the Latin confundere: to mix together) is a distortion of an association between E and D brought about by a cofactor C. Confounding occurs when E is associated with C and C is an independent risk factor for D. In addition, C is not intermediate in the causal pathway.

For example, smoking (C) confounds the relation between alcohol consumption (E) and lung cancer (D) because alcohol user are more likely to smoke than non-users. Thus, the effects of smoking get mixed-in with the effect of alcohol consumption--smoking confounds the association between alcohol consumption and lung cancer.

One way to address confounding is to subset data into relatively homogenous subgroups ("strata") according to the confounding cofactor. Not surprisingly, data can show one thing in aggregate form and another once disaggregated.

Measures of association in the aggregate are called crude measures of association since relations are unadjusted. Let us precede symbols for measures of association with a c when referring to crude measures of association. For example, cRR will represent the crude risk ratio (i.e., the risk ratio based on all data combined in single 2-by-2 table).

Subscripts will denote strata-specific measures of association. For example, RR₁ will represent the risk ratio in stratum 1, RR₂ will represent the risk ratio in stratum 2, and so on.

Suppose, in the aggregate, we see the following crude data:

D+ D- E+ 200 800 1000 E- 50 950 1000 250 1750 2000

Therefore, p₁ = 200 / 1000 = .20, p₂ = 50 / 1000 = .05, and cRR = .20 / .05 = 4.0.

Now suppose we stratify by confounding factor C. In strata 1 (positive for C) and find:

D+ D- E+ 194 606 800 E- 24 76 100 218 682 900

In this strata, p_1,1 = 194 / 800 = .2425, p_2,1 = 24 / 100 = .24, and RR₂ = .2425 / .24 @ 1.0.

In strata 2 (negative for factor C) we find:

D+ D- E+ 6 194 200 E- 26 874 900 32 1068 1100

In this strata, p_1,2 = 6 / 200 = .03, p_2,2 = 26 / 900 = .0288, and RR₂ = .03 / .0288 @ 1.0.

Therefore, the strong positive association seen in the aggregate disappears in the subgroups. This proves C confounded the association between E and D in the aggregate.

Statistical Interaction (Effect-Measure Heterogeneity)

The term "interaction" has two distinct meanings in epidemiology. Biological interaction is the interdependent operation of two or more factors in a cause. There is always biological interaction in epidemiologic data. Statistical interaction is when the statistical model being used does not explain the joint effects of two or more independent variables. Biological interaction and statistical interaction are two distinct phenomena that should not be confused. Here, we consider statistical interaction only.

Statistical interaction is synonymous with effect-measure heterogeneity. In epidemiology, this occurs when the value for the effect-measures being used (e.g., risk ratio) is differs in different subgroups. A numerical example will serve to illuminate.

Once again we may start with the crude (unstratified) data:

D+ D- E+ 200 800 1000 E- 50 950 1000 250 1750 2000

Again, p₁ = 200 / 1000 = .20, p₂ = 50 / 1000 = .05, and cRR = .20 / .05 = 4.0.

Suppose, on stratification, we find:

Stratum 1 (negative for C)

D+ D- E+ 12 188 200 E- 48 752 800 60 940 1000

Therefore, p_1,1 = 12 / 200 = .06, p_2,1 = 48 / 800 = .06, and RR₁ = .06 / .06 = 1.0.

Stratum 2 (positive for C)

D+ D- E+ 188 612 800 E- 2 198 200 190 810 1000

Therefore, p_1,2 = 188 / 800 = .2350, p_2,2 = 2 / 200 = .01, and RR₂ = .235 / .01 = 23.5.

Because the risk ratio is heterogeneous in the two strata, we say there is a statistical interaction between E and C as relates to D.

The above demonstrations suggest a strategy for dealing with extraneous factors. In essence, data are explored through stratification.

Illustrative Data Set (`SEXBIAS.REC`)

To illustrate methods in this chapter, let us consider a data set that demonstrates both interaction and confounding. Data were collected as part of a University of California at Berkeley study to assess whether men were being given preferential treatment over women in admission to graduate programs (Bickel & O'Connell, 1975, Freedman et al., 1991, pp. 16 - 19). Assuming that the men and women who applied for admission to the graduate programs were equally well-qualified, one would expect equal acceptance rates by gender. However, it initially appeared as if men were being admitted in greater proportions than women. Hence, the investigation.

The experience of applicants to the six largest majors at the school is stored in SEXBIAS.ZIP. This data set contains 4526 records and the following variables:

Variable Type Len Description

MAJOR Alpha 9 Department major: A, B, C, D, E, and F

SEX Alpha 9 1 = Male 2 = Female

ACCEPT Yes/no 1 Application accepted: +/-

Crude analysis (TABLES SEX ACCEPT) derives:

                 ACCEPT
SEX            +      -    | Total
-----------+---------------+------
         1 |   1198   1493 | 2691     Acceptance rate, men = 1198 /2691 = 0.445
         2 |    557   1278 | 1835     Acceptance rate, women = 557 / 1835 = 0.304
-----------+---------------+------     RR = 0.445 / 0.304 = 1.46
     Total |   1755   2771 | 4526     P < 0.00001

Therefore, men appear to have a higher acceptance rate than women ( supporting evidence of preferential treatment). However, what if men had applied to majors with more favorable acceptance rates than women? Then the cofactor of MAJOR would confound the observed relation. To investigate this possibility, data are stratified by MAJOR.

Stratification

Table stratification is accomplished with the command:

EPI6> TABLE <E> <D> <C>

For the illustrative example, the following command is issued:

EPI6> TABLES SEX ACCEPT MAJOR

This produces separate tables for each of the 6 majors. Annotated output is shown below:

               MAJOR =A
                 ACCEPT
SEX        |     +     - | Total
------------------------------
         1 |   512   313 |   825     Acceptance rate, men = 512 / 825 = 0.621
         2 |    89    19 |   108     Acceptance rate, women = 89 / 108 = 0.824
-----------+-------------+------     RR = 0.621 / 0.824 = 0.75
     Total |   601   332 |   933     p = 0.000033

               MAJOR =B
                 ACCEPT
SEX        |     +     - | Total
-----------+-------------+------
         1 |   353   207 |   560     Acceptance rate, men = 353 / 560 = 0.630
         2 |    17     8 |    25     Acceptance rate, women = 17 / 25 = 0.680
-----------+-------------+------     RR = 0.630 / 0.680 = 0.93
     Total |   370   215 |   585     p = 0.61

               MAJOR =C
                  ACCEPT
SEX        |     +     - | Total
-----------+-------------+------
         1 |   120   205 |   325     Acceptance rate, men = 120 / 325 = 0.369
         2 |   202   391 |   593     Acceptance rate, women = 202 / 593 = 0.341
-----------+-------------+------     RR = 0.369 / 0.341 = 1.08
     Total |   322   596 |   918     p = 0.39

              MAJOR =D
                  ACCEPT
SEX        |     +     - | Total
-----------+-------------+------
         1 |   138   279 |   417     Acceptance rate, men = 138 / 417 = 0.331
         2 |   131   244 |   375     Acceptance rate, women = 131 / 375 = 0.349
-----------+-------------+------     RR = 0.331 / 0.349 = 0.95
     Total |   269   523 |   792     p = 0.59

              MAJOR =E
                 ACCEPT
SEX        |     +     - | Total
-----------+-------------+------
         1 |    53   138 |   191     Acceptance rate, men = 53 / 191 = 0.277
         2 |    94   299 |   393     Acceptance rate, women = 94 / 393 = 0.239
-----------+-------------+------     RR = 0.277 / 0.239 = 1.16
     Total |   147   437 |   584     p = 0.32

              MAJOR =F
                      ACCEPT
SEX        |     +     - | Total
--------------------------------
         1 |    22   351 |   373     Acceptance rate, men = 22 / 373 = 0.059
         2 |    24   317 |   341     Acceptance rate, women = 24 / 341 = 0.070
-----------+-------------+------     RR = 0.059 / 0.070 = 0.84
     Total |    46   668 |   714     p = 0.54

Therefore, only Major A demonstrates a significant difference in acceptance rates by sex -- and this in favor of women by a small margin. Notice that the initial crude analysis hid this pattern (a.k.a., Simpson's paradox). It is now evident that application to specific MAJORs confounds the study of SEX and ACCEPTance rates and there is an interaction between SEX and MAJOR.

Test for Interaction

A chi-square test for interaction may be used to help whether effect-measure heterogeneity is present. Because this test applies to both risk ratios and odds ratios (and other measures of association), let MA refer to the measure of association parameter being studied. The null and alternative hypotheses are:

H₀: MA₁ = MA₂ = . . . = MA_S (no interaction)
H₁: at least one of the strata-specific measure of association differs (interaction)

The method of calculating the chi-square interaction statistic in Epi Info is unspecified, but it is assumed to be a general Wald statistic (see Epidemiology Kept Simple Formula 15.1). Under the null hypothesis, this chi-squared interaction statistic has S - 1 degrees of freedom, where S represents the number of strata being tested.

Illustrative example. In SEXBIAS.REC we test H₀: RR₁ = RR₂ = RR₃ = RR₄ = RR₅ = RR₆. Results, printed in the summary section of the stratified output, are:

Chi Square for evaluation of interaction 18.10
P value 0.00282859

Since there are 6 strata, df = 5. This along with the divergent incidence (risk) ratio in strata 1 suggests that statistical interaction is present.

Mantel-Haenszel Methods

It is often advantageous to summarize the relation being studied with a single, unconfounded measure of association and tests. This can be accomplished by pooling unconfounded strata-specific measures of association to form a summary measure of association.

Summary Measure of Association

The Mantel-Haenszel method of pooling calculated as weighted average of strata-specific estimates with weights proportional to N₁*N₂/N, where N represents the total number of people in the strata (Cochran 1954; Mantel & Haenszel 1959). This assumes the measures of association are uniform among strata. This homogeneity assumption allows us to combine strata-specific measures of association to form a single summary measure that has been adjusted for confounding. Any non-uniformity will be suppressed nonuniformity through summarization. The pooled measure of association may be viewed as a statistical convenience whose purpose is to draw correct conclusions about the effect of the exposure.

Illustrative Example (SEXBIAS.REC). By suppressing the non-uniformity of the incidence (risk) ratios in SEXBIAS.REC, we find:

                  SUMMARY RISK RATIO (RR)
Crude RR without stratification                                         1.47
Summary RR of (ACCEPT=+) for (SEX=1)                                    0.94 95% confidence limits for RR                                0.87 < RR < 1.03

Comments:
(1) The crude RR estimate of 1.47 indicates higher acceptance for men, whereas the summary estimate of 0.94 indicates slightly higher acceptance rates in women. Thus, the summary RR is an unconfounded estimate of the effect of gender on acceptance to graduate school at UC Berkeley.
(2) The 95% confidence interval for the summary RR is calculated using the method in Robins et al., 1986.

Mantel-Haenszel Summary Test Statistic

A test of H₀: aMA = 1 (where aMA represents the parameter for the Mantel-Haenszel adjusted measure of association) is performed with a Mantel-Haenszel chi-square statistic. Under the null hypothesis, this test statistic has a chi-square sampling distribution with 1 degree of freedom.

Illustrative Example (SEXBIAS.REC). The null hypothesis H₀: aRR = 1 is tested with a Mantel-Haenszel summary chi-square statistic. The Mantel-Haenszel test statistics for SEXBIAS.REC are:

          ** Summary of 6 Tables With Non-Zero margins **
                              N = 4526
Mantel-Haenszel Summary Chi Square                                      1.43
P value                                                           0.23226346

Comment: The p value of .23 fails to provide evidence against H₀. We conclude no significant difference in acceptance rates by gender.

When Interaction and Confounding Are Minimal

In the absence of interaction and confounding, stratification and adjustments are unnecessary. In such instances, crude measures of association offer the benefit of better precision (compared with M-H summary measures of association).

Illustrative Example. Data from a case-control study of esophageal cancer and tobacco consumption (Breslow & Day, 1980; Tuyns, 1977) are available in BD1NEW.ZIP. We are interested in the relation between tobacco consumption (TOBHIGH: 1 = 20+ g/day, 2 = less than 20 g/day) and esophageal cancer (CASE: 1 = case, 2 = control) while considering the possible confounding or effect-measure modifying effects of alcohol consumption (ALCHIGH: 1 = 80+ g /day, 2 = < 80 g/day). The following commands are issued to analyze the data:

EPI6> READ BDNEW
EPI6> TABLES TOBHIGH CASE ALCHIGH

Key output includes:

Stratum 1 (ALCHIGH = 1)
                  CASE
TOBHIGH    |     1     2 | Total
-----------+-------------+------
1          |    30    23 |    53
2          |    66    86 |   152
-----------+-------------+------
     Total |    96   109 |   205

Single Table Analysis Stratum 1 Odds ratio = 1.70

Stratum 2 (ALCHIGH = 2)
                  CASE
TOBHIGH    |     1     2 | Total
-----------+-------------+------
1          |    34   127 |   161
2          |    70   539 |   609
-----------+-------------+------
     Total |   104   666 |   770

Single Table Analysis Stratum 2 Odds ratio = 2.06

Thus, the strata-specific odds ratios are 1.70 and 2.06, respectively. We might now ask if it makes sense to summarize these two odds ratio with a single summary statistic. The chi-square interaction statistic (H₀: OR₁ = OR₂) is helpful in this regard. Epi Info prints this information in the area labeled "Summary Odds Ratio":

Chi Square for evaluation of interaction 0.24
P value 0.62621898

In this instance df = 2 - 1 (not shown by Epi Info) and c�_int = 0.24, p = 0.63. This supports an assumption that differences in strata-specific odds ratios may be random (no statistical interaction).

The crude odds ratio and M-H summary odds ratio also listed in the area labeled "Summary Odds Ratio":

SUMMARY ODDS RATIO Crude OR 1.96 Mantel-Haenszel weighted Odds ratio 1.92

We also note that the crude odds ratio and Mantel-Haenszel weighted odds are similar. Therefore, it is reasonable to report the crude odds ratio. To get the confidence interval and p value for the crude odds ratio issue the command. For example,

EPI6> TABLES TOBHIGH CASE

Output is:

TOBHIGH    |     1     2 | Total
-----------+-------------+------
1          |    64   150 |   214
2          |    136 625 |   761
-----------+-------------+------
     Total |   200   775 |   975

Odds ratio 1.96 Cornfield 95% confidence limits for OR 1.36 < OR < 2.82

Strategy for Analysis

Although the detection and control of confounding is crucially important in epidemiologic research, there exists no single way for dealing with this problem. Nevertheless, epidemiologists agree that potential confounders must be identified before data are collected so that data on these factors can be collected to allow further evaluation. So how does one know what variables might confound an analysis? Briefly, this information comes from an understanding of the systems being investigated, and is based on previous research, clinical insight, and understanding of the processes being studied. It is essential that the investigator "does their homework," researching all potential confounders, before collecting data. With this said, a couple of rules-of-thumb are presented:

(1) Adjustments for confounding are contraindicated when interaction is present, as such summary measures of association would obscure important modifications of effect.

(2) Since confounding is a matter of systematic error (not random error), hypothesis tests should not be used in the detection of confounding.

(3) A pragmatic strategy for calculating good measures of association suggests:

Before the study is begun, the investigator attempts to understand the complex causal interrelations among the exposure, disease, and various other factors. This may require lots of homework on the part of the investigation, as well as close collaboration with
subject matter specialists.
Measurements and coding for E, D, and C₁, C₂, ..., C_k must be valid based on understanding of phenomena.
The research question must be defined in an insightful way. "Finding the question is often more important than finding the answer" (Tukey, 1980).
Study design are based on choices that maximize the likelihood of delineating causal relations.
After data are collected, entered and cleaned, the analyst explores inter-relations, starting with simple comparisons and descriptions. Identified relationships between E and C and C and D heighten the awareness of the potential for confounding.
Data are stratified and explored for interaction. (The above test for interaction may be applied.) When interaction is confirmed, strata-specific estimates are reported.
The continued consultation with a subject matter specialist may be necessary before a decision is made whether or not to control for potential confounder C.
In the absence of interaction and confounding, crude (unadjusted) estimates of association may be reported.
The best estimate of association is both valid and precise. If interaction is present, strata-specific measures of association are reported. If interaction is absent but confounding is present, summary (adjusted) measures of association are reported. If neither interaction nor confounding are present, crude (unadjusted) measures of association are reported.
In practice, there will always be uncertainty about whether a given set of variables are or are not confounders. "Science DOES NOT BEGIN WITH A TIDY QUESTION. Nor does it end with a tidy answer" (Tukey, 1980).

Exercises

(1) GENERIC.ZIP: Simpson's Paradox (Hypothetical Data). This exercise illustrates Simpson's Paradox while applying a strategy for the detecting and accounting for confounding and interaction. Three case-control data sets are presented: GENERIC1.REC, GENERIC2.REC, and GENERIC3.REC. Each data set contains the variables E (exposure), D (disease), and C (potential confounder). For each data set determine if interaction is present. If interaction is present, stop there and report strata-specific odds ratios and other relevant case-control statistics. If interaction is absent, assess the potential for confounding. Summarize your assessment. If confounding is present, report an adjusted odds ratio and associated case-control statistics. If interaction and confounding are absent, report the crude (unadjusted) case-control statistics.

(2) BD2.ZIP: Breslow & Day 2: The Oxford Childhood Cancer Survey (Breslow & Day, 1980, p. 238; Kneale, 1971; Steward & Kneale, 1970). Data are from a case-control study of childhood leukemia and lymphoid tumors and in utero X-ray exposure (Kneale et al., 1971). The primary variables of interest are CASE (1 = case, 2 = control), XRAY (1 = exposed, 2 = unexposed). The potential confounder is AGE (years). Analyze these data and report the "best" odds ratio estimate and a 95% confidence interval for the parameter. Summarize your results in narrative form.

(3) BI-HELM1.ZIP: Bicycle Helmet Use in Two Northern California Counties (Perales et al., 1994). This data set contains information on bicycle helmet use in Santa Clara County and Contra Costa County -- two counties in northern California (U.S.A.). Data definitions are included in a data documentation file in the ZIP archive (bi-helm1-dd.htm), which can be downloaded by clicking on the highlighted text, above. Review this data documentation file and then perform the following analyses.
(A) Determine crude incidences of helmet use in Santa Clara County (p₁) and Contra Costa County (p₂). (The easiest way to derive these statistics is to use a two-variable tables command TABLES COUNTY HELMETUSE ). Test whether these proportions differ, and summarize your results.
(B) Stratify the data on the matching variable (TABLES COUNTY HELMETUSE MATCHVAR). Stratify the data based on the socioeconomic matching variable MATCHVAR. Report strata-specific helmet use rates by school and test whether within-strata rates differ significantly. Summarize your results narratively.
(C) Test the incidence (risk) ratio parameter for interaction Be explicit in listing the null and alternative hypotheses. Report all relevant test statistics and state your conclusion.
(D) Discuss your findings. In so doing, consider the potential for interaction and confounding. Which schools show higher helmet-use rates compared with their matched counterpart? etc.

(4) CERVICAL: Cervical Cancer and Smoking (Nischan et al., 1988; Pagano & Gauvreau, 1993, p. 359). Data from a case-control study of cervical cancer and smoking are shown below.

	Case	Control
Smoke +	108	163
Smoke -	117	268

(A) Based on these data calculate the odds ratio of smoking for cervical cancer.

(B) Data stratified by number of sexual partners are shown below. Calculate stratum specific odds ratios.

	Stratum 1: Zero or One Partner
	Case	Control
Smoke +	12	21
Smoke -	25	118

	Stratum 2: Two or More Partners
	Case	Control
Smoke +	96	142
Smoke -	92	150

(C) Based on these exploratory analyses, would you say there is interaction? Justify your response. How would you report your results?

(5) ASBESTOS.ZIP: Asbestos Exposure and Lung Cancer (Hypothetical data). Data are from an case-control study of lung cancer and asbestos exposure. The data set includes information on smoking (SMOKE: + / -), asbestos exposure (ASBESTOS: + / -), and lung cancer (LUNGCA: + / -)

(A) Calculate the odds ratio of lung cancer associated with smoking. Include a 95% confidence interval, and interpret your findings.
(B) Calculate the odds ratio of lung cancer associated with asbestos exposure. Include a 95% confidence interval and interpret your findings.
(C) An investigator thinks it would be interesting to sort out the inter-relationship between asbestos, smoking, and lung cancer by looking at the lung cancer risk associated with asbestos in smokers and non-smokers separately. Perform such a stratified analysis. In so doing, report strata-specific odds ratios. Perform a test for interaction. (Include all hypothesis testing steps.) Is interaction present? Calculate and report the summary (adjusted) odds ratio. Is confounding evident? Is confounding present? Would it make sense to report the adjusted odds ratio in light of your findings about interaction? How would you report your results? Report your final results.

Key

References

Bickel, P. & O'Connell, J. W. (1975). Is there a sex bias in graduate admissions? Science, 187, 398 - 404.

Breslow, N. E., & Day, N. E.(1980). Statistical Methods in Cancer Research. Volume 1--The Analysis of Case-Control Studies. Lyon: International Agency for Research on Cancer.

Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. Biometrics, 10, 417-451.

Freedman, D., Pisani, R., Purves, R., & Adhikari, A. (1991). Statistics (2nd ed.) New York: W. W. Norton.

Gerstman, B. B., Jolson, H., Bauer, M., Cho, P., Livingston, J., & Platt R. (1996). Depression in new users of ß-blockers and selected anti-hypertensives. Journal of Clinical Epidemiology, 49, 809 - 815.

Hirayama, T. (1990). Life-style and Mortality: a Large Scale Census-based Cohort Study in Japan. Basel: S. Karger.

Kneale, G. W. (1971). Problems arising in estimating from retrospective survey data the latent period of juvenile cancers initiated by obstetric radiography. Biometrics, 27, 563 - 90.

Kramer, M. S. (1988). Clinical Epidemiology and Biostatistics. Berlin: Springer-Verlag.

Lilienfeld, D. E. & Stolley, P. D. (1994). Foundations of Epidemiology (3rd ed.). New York: Oxford.

Mandel, E., Bluestone, C. D., Rockette, H. E., Blatter, M. M., Reisinger, K. S., Wucher, F. P., & Harper, J. (1982). Duration of effusion after antibiotic treatment for acute otitis media: comparison of cefaclor and amoxicillin. Pediatric Infectious Diseases, 1, 310 - 316.

Mantel, N., Haenszel, W.. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719 - 748.

Nishan, P., Ebeling, K, Schindler C. (1988). Smoking and invasive cervical cancer risk: results from a case-control study. American Journal of Epidemiology, 128, 74 - 77.

Pagano, M. & Gauvreau, K. (1993). Principles of Biostatistics. Belmont, CA: Duxbury Press.

Perales, D. & Gerstman, B. B. (1995, March). A bi-county comparative study of bicycle helmet knowledge and use by California elementary school children. The Ninth Annual California Conference on Childhood Injury Control, San Diego, CA.

Robins, J., Breslow, N., & Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42, 311-323.

Rosner, B. (1990). Fundamentals of Biostatistics (3rd ed.) Boston: PWS - Kent Publishing.

Rothman, K. J. (1975). A pictorial representation of confounding in epidemiologic studies. Journal of Chronic Diseases, 28, 101 - 108.

Stewart, A. & Kneale, G. W. (1970). Age-distribution of cancers caused by obstetric X-rays and their relevance to cancer latent periods. Lancet, ii, 4 - 8.

Tuyns, A. J., Péquignot, G., & Jensen, O. M.. (1977). Le cancer de l'oesophage en Ille-et Vilaine en function des niveaux de consommation d'alcool et de tabac. Des risques qui se multiplient. Bull Cancer, 64, 45 - 60.

Variable	Type	Len	Description
`MAJOR`	Alpha	9	Department major: A, B, C, D, E, and F
`SEX`	Alpha	9	1 = Male 2 = Female
`ACCEPT`	Yes/no	1	Application accepted: +/-