Binary Outcome, Case-Control Samples

Background | Confidence Interval | p Value | Power and Sample Size | Exercises


In the previous chapter we compared the incidence of disease in an exposed and non-exposed group. In this chapter, we select from a source population people with disease (cases) and without disease ("controls") and then compare their prior exposure experience (case-control sampling). Data from case-control studies, once cross-tabulated, are displayed as follows:

Cases  Controls
Exposure + a b n1
Exposure - c d n2
m1 m2 N

The exposure proportion in cases is p1 = a / m1 and the exposure proportion in controls is p2 = b / m2. The complement of pi is qi. Thus, the odds ratio (or) is:

      p1 / q1   (a/m1)/(c/m1)  a/c    ad
or = ------- = ----------- = ---- = ---
      p2q2   (b/m2)/(d/m2)  b/d    bd

It can be shown that the odds ratio from a case-control study is stochastically equivalent to a rate ratio or risk ratio, depending on how cases and controls are sampled from the source population (incidence density vs. risk sampling).

Illustrative Example (BD1NEW.ZIP). Breslow and Day (1980) report data for a case-control study of 200 esophageal cancer cases and 775 community-based controls (Tuyns, 1977).  A detailed dietary interview with questions about alcohol consumption, tobacco use and other factors was administered to all study participants. We want to learn about the relation between CASE (1 = case, 2 = control) and ALCHIGH (alcohol consumption dichotomized at 80 grams per day: 1 = high, 2 = low). 

EpiInfo Commands. To process the data in Epi Info issue the command:

EPI6> TABLES <exposure> <disease>

where <exposure> represents the name of the exposure variable and <disease> represents the name of the disease variables.

For our illustrative data, issue the command:


Data are: 

ALCHIGH    |     1     2 | Total
         1 |    96   109 |   205
         2 |   104   666 |   770
     Total |   200   775 |   975

From the above table we determine that the exposure proportion in cases (p1) = 96 / 200 = 0.480, the exposure proportion in controls (p2) = 109 / 775 = 0.141. The odds ratio = (96)(666) / (109)(104) = 5.64, suggesting high-level alcohol consumers have 5.6 times the incidence of esophageal cancer as low-level alcohol consumers.

Confidence Interval

The point estimate and 95% confidence interval for the OR are printed below the 2-by-2 table:

                             Single Table Analysis
Odds ratio                                                         5.64
Cornfield 95% confidence limits for OR               3.93 < OR <   8.10
Maximum likelihood estimate of OR (MLE)                            5.63
Exact 95% confidence limits for MLE                  3.94 < OR <   8.06
Exact 95% Mid-P limits for MLE                       3.99 < OR <   7.95

The standard "cross-product ratio" odds ratio point estimate is printed in line 1 (or = 5.64). The standard 95% confidence interval for the OR is printed in line 2 (95% CI: 3.93 - 8.10). Maximum likelihood estimates (reported on lines 3 - 5) are seldom necessary.

p Value

P value for H0: OR = 1 are computed with three different chi-square methods: 

                         Chi-Squares   P-values
                         -----------   --------
        Uncorrected:       110.26     0.00000000 <---
        Mantel-Haenszel:   110.14     0.00000000 <---
        Yates corrected:   108.22     0.00000000 <---

There is no agreement on which p value is superior. In this case it matters little which we choose (p < .000001, regardless). As discussed in the prior chapter, Fisher's exact test should be used when an expected frequency is less than 5.

Power and Sample Size

The statistical power of a study is the probability of correctly rejecting a false H0 under certain distributional assumptions. The program EpiTable has an excellent power and sample size calculator. In using, EpiTable select Sample > Power calculation > Case-control study. You must then provide assumptions for the number of cases (m1), the ratio of controls to cases in the study (m2 /m1), an odds ratio "worth detecting" (OR), the exposure proportion in controls (p2), and the alpha level (a) or confidence level (1 - a) of the required by the research. Sample size requirements can be determined selecting EpiTable > Sample > Sample Size > Case-control study.

Illustrative example. Suppose we want to detect an OR of 2 using an ratio of 1:1 cases to controls in a population with an expected exposure proportion in non-cases of 0.25 while requiring a = 0.05 and power = 0.8. EpiTable calculates m1 = m2 = 165. (Total sample size = 330).


(1) DOLL1950: Smoking and Lung Cancer (Doll & Hill, 1950). A historically important case-control study of smoking and lung cancer found 647 of 649 lung cancer cases were smokers. In contrast, 622 of 649 non-cancer controls were smokers. Show these data in a 2-by-2 table and then, using an epidemiologic calculator, compute the odds ratio and its 95% confidence interval. Interpret your findings.

(2) ESOPH_CA.ZIP: Esophageal Cancer and Tobacco Consumption (Tuyns, 1977; Breslow & Day, 1980). Download the data set ESOPH_CA. Then determine the effect of tobacco consumption with alcohol dichotomized at 80 gms/day (TOB2) on esophageal cancer risk (ESOPH_CA: 1 = case, 2 = control). Compute the odds ratio and its 95% confidence interval. Then perform a significance test and summarize your results in narrative form.

(3) ESOPH_CA.ZIP: Esophageal Cancer and Age (Tuyns, 1977; Breslow & Day, 1980). Use the same data set you used in the previous exercise to determine the effect of age on esophageal cancer risk. The exposure variable is AGE2 (older = 55+ years, younger = 35 -54-years). The disease variable is ESOPH_CA (1 = case, 2 = control). Compute the odds ratio and its 95% confidence interval. Interpret your findings.

(4) BD2.ZIP: Breslow & Day 2 (Stewart & Kneale, 1970; Kneale, 1971; Breslow & Day, 1980, p. 238). Data come from a case-control study of childhood leukemia and lymphoma and in utero exposure to X-rays. Cases are children less than 10 years of age in England and Wales that occurred during the period 1954-65 (variable CASE: 1 = yes, 2 = no). For each case, a neighborhood control of the same age and year of birth was selected. Exposure status is based on whether mothers were exposed to X-rays during pregnancy (variable XRAY: 1 = yes, 2 = no). Calculate the odds ratio estimate and 95% confidence for the odds ratio. Test the odds ratio for significance. Narratively interpret your findings.

(5) IUD: Intrauterine Device Use and Infertility (Cramer et al., 1985; Rosner, 1990, p. 381). A study of contraceptive use and infertility found prior use of intra-uterine devices (IUDs) in 89 out of 283 infertile women. In contrast, 640 out of 3833 (fertile) control women used IUDs. Calculate relevant case-control statistics and then summarize your results in plain language.

(6) PROSTATE.ZIP: Vasectomy and Prostate Cancer (Data source: Zhu et al., 1996). A case-control study was conducted to help assess the potential relationship between vasectomy and prostate cancer. Calculate the odds ratio and its 95% confidence interval. Then, determine the sample size required to detect a significant odds ratio of 1.3 with 80% power.

(7) ASBESTOS.ZIP: Asbestos Exposure and Lung Cancer (Hypothetical data). Data are from a case-control study of lung cancer. The data set contains information on smoking status (SMOKE: Y / N), asbestos exposure (ASBESTOS: Y / N), and lung cancer  (LUNGCA: Y / N).
(A) Calculate the odds ratio of lung cancer associated with smoking. Include a 95% confidence interval. Interpret your findings.
(B) Calculate the odds ratio of lung cancer associated with asbestos exposure. Include a 95% confidence interval and interpret your findings.

(8) BRAINTUM.ZIP: Electric blanket use and brain tumors in children (Preston-Martin et al., 1996). This case-control study analyzed the relation between brain tumors in children (BRAINTUM: Y/N)and exposure to electric blankets and water bed heaters (ELECBLANK: Y/N). Analyze the data and interpret your results. Then, calculate the study's power to uncover odds ratios of (i) 1.1; (ii) 1.2; (iii) 1.3; (iv) 1.4; (v) 1.5; (vi) 1.6; (vii) 1.7; (viii) 2.0; (ix) Combine your power estimates to form a power curve so that the x-axis represents the expected odds ratio and the y-axis represents the study's power. Discuss your power analysis in this light. Consider at what point the study's power becomes adequate? What can be done to improve this study's power? Would you supplement the study with additional information?