Introduction | Confidence
Interval | *p *Value |
Sample Size Requirements | Exercises | Clustered Samples

This chapter considers the analysis of a binary ("yes/no") outcome measured
in *n *individuals from a single group. Let *x* represent the number of
"positives" in the group and *y *represent the number of
"negatives." Thus, *n* = *x* + *y* and *p* represents the sample
proportion:

*p* = *x */ *n*

When data represent a simple random sample from a large population, *p* is
the estimator of binomial parameter *P*, and *P * represents the
prevalence proportion or incidence proportion (risk) depending on the
nature of the data (see *Epidemiology
Kept Simple*, Chap 6).

* Illustrative data. *In a sample of 57 people, we
find 17 smokers. (Data are stored in PREVSMOK.REC as
the variable

Frequencies and confidence intervals are derived with the `FREQ` command:

`FREQ <varname>`

For our illustrative data set, the command is:

`FREQ SMOKER`` `

Output** **is:

`SMOKER | Freq Percent Cum. 95% Conf Limit`

`-------+--------------------------------------`

`0 | 40 70.2% 70.2% 56.6%-81.6%`

`1 | 17 29.8% 100.0% 18.4%-43.4%`

Thus, the prevalence is 29.8% (95% interval for *P*: 18.4%,
43.4%). It appears as if calculation of the confidence interval uses a mathematical relation between the
binomial distribution and F distribution (Fisher & Yates, 1963;
Zar, 1996, p. 524). (CDC's documentation of the procedure is somewhat ambigious.)

An exact binomial test is to evaluate *H*_{0}: *P *= *P*_{0}*,
*where *P*_{0}* *represents the binomial parameter
under the null hypothesis. A *p *value for this test can be computed with EpiTable as follows: EpiTable > Probability > Binomial: Proportion vs. Standard.
(*EpiTable *is part of Epi6, but is NOT included in *Epi2002*).

** Illustrative example. **Suppose we want to compare the prevalence of smoking in our sample (

To derive a 95% confidence interval of *P *with a *margin of error* no greater than *d*,
use the formula:

*n* = (3.84)(*P*)(*Q*) / *d*˛

where *P *is a pretty good guess for parameter *P *and *Q* = 1 - *P*.

** Illustrative example. **To achieve

If no good
estimate for *P *is available, let *P* = .50 to ensure a sufficient
sample size.

Cluster sampling randomly selects units composed of smaller elements of interest. Examples of clusters include:

Cluster |
Element |

Family | Members of the family |

Restaurant | Employees |

Carton of eggs | Individual eggs |

Peach tree | Individual peaches |

Patient | Multiple samples from same patient |

When cluster sampling is used, variance estimates must be modified by adding a **design effect ( deff)**. In effect, the design effect
describes the relative change in the variance caused by cluster sampling. Use the program

** Illustrative example. **Suppose in five patients with severe psoriasis, with each patient receiving a parental treatment for his or her
condition, the following results are observed:

Patient |
Number of lesions cleared by treatment |
Number of lesions |

1 | 5 | 12 |

2 | 4 | 7 |

3 | 12 | 13 |

4 | 8 | 15 |

5 | 5 | 16 |

Total |
34 |
63 |

Each patient represents a cluster and each lesion an element in the cluster. To compute the design effect associated with clustering select EpiTable > Describe > Proportions > Design Effect. Output is

`Clust Num Den
Nē 1 5 12
Nē 2 4 7
Nē 3 12 13
Nē 4 8 15
Nē 5 5 16 `

`Global variance : 0.003943
Cluster variance : 0.010739
Design effect : 2.72`

Thus, *deff* = 2.72. We then use EpiTable > Describe > Proportions > Cluster Sampling to calculate the
confidence interval for *P*. Here is the output:

` Numerator 34`

` Total observations 63`

` Design effect 2.72`

`Total observations : 63`

`Design effect : 2.72`

`Effective sample size : 23`

`Proportion : 53.9683%`

`Fleiss quadratic 95% CI [ 32.7038-73.9924`

Thus, the 95% confidence interval for *P *is (32.7%, 74.0%). (If we had *mistakenly *assumed that data were a simple random sample, the
confidence interval would have been: 41.0%, 66.4%).

**(1) ELECT**: A pre-election poll of 100 prospective voters shows 55 in favor of Candidate A. Use EpiTable or some other
epidemiologic calculator to compute a 95% confidence interval for *P*, the percentage of the electorate favoring Candidate A. Based on
this estimate, do you think results provide reliable evidence of a future victory for Candidate A? Explain your reasoning.

**(2) BREASTCA**: We expect 2% of women at age 50 to develop breast cancer within 5 years. Suppose that among 1000 women in this
age range who have a mother with breast cancer, 32 develop breast cancer. How many cases would be expected in this group? Use
EpiTable > Probability > Binomial to determine if the number of cases significantly greater than expected?

**(3) PREGRATS**: A laboratory test of the teratogenicity of an agent shows 12 malformed (rat) pups in a litter of 85. We normally
expect a malformation rate of 5% in this species. Do data provide evidence of teratogenicity? Test *H*_{0}: *P* = 0.05 at a = .01, one-sided.

**(4) SMOKE.REC**: The data set SMOKE.REC records the number of days each client successfully stays smoke-free after a smoking
cessation program. (Data are recorded as the variable DAYS.) Download and unzip this data set. Read the data set into EpiInfo and
then convert the variable DAYS into a dichotomous outcome indicating whether the person ceased smoking for at least a year.
Compute a 95% confidence interval for recidivism proportion *P*. In your opinion, was the smoking cessation program successful?

**(5) BINSIZE. ** Determine the sample size needed to calculate a 95% confidence interval for a proportion with a margin or error of no
more than 10%, assuming an expected proportion of 50%. Recalculate the sample size requirements for the study assuming *d* = 5%.

**(6) EDENTITION**: A report of adult dental health in 25- to 34-year-old English women showed 20 of 262 women with missing teeth.
Calculate a 95% confidence interval for *P*.

**(7) FEV.ZIP **Download the data set FEV. Unzip the ZIP file. A data definition (DD) file is included as one of the files in the ZIP
archive. These data are from a survey of respiratory health (Rosner, 1995, p. 40; Tager et al., 1985). For each dichotomous variable in
the data set, report relevant counts (*x *of *n*) and proportions. Also report 95% confidence for each proportion parameter.

**(8) THERAPEUTIC_TOUCH.** A study by an 11-year old girl made headlines for challenging the validity of a type of therapy
("Therapeutic Touch") in which the therapist's hands are passed over a patient's body without actually being laid on the patient,
supposedly to manipulate human energy fields (Rosa et al., 1998). In the current experiment, Touch Therapists rested their hands,
palms up, on a flat surface that was approximately 25 to 30 cm apart. To prevent the experimenter's hands from being seen, an opaque
screen with cut-outs at its base was placed over each subject's arm and a cloth towel was attached to the screen and draped over the
therapists' arms. Each therapist underwent 10 trials in which the 11-year-old investigator hovered her right hand, palm down, 8 to 10
cm above one hand of the therapist and then said "Okay." The Touch Therapist then stated which of his or her hands was nearer the
experimenter's hand. Each subject was permitted to take as much or as little time as necessary to make each determination. Results
showed 123 successes out of 280 trials, with the distribution of successes out of 10 trials distributed as follows:

No. correct (out of 10) |
Frequency |
No. Correct |

0 | 0 | 0 |

1 | 1 | 1 |

2 | 1 | 2 |

3 | 8 | 24 |

4 | 5 | 20 |

5 | 7 | 35 |

6 | 2 | 12 |

7 | 3 | 21 |

8 | 1 | 8 |

9 | 0 | 0 |

10 |
0 |
0 |

Total |
28 |
123 |

We want to calculate a 95% confidence interval for the proportion of successes accounting for the cluster sample. (Each therapist is a
cluster of 10 observations.) To do this, we must calculate the design effect (*deff*). (You will find *deff* = 1.12.) Now use EpiTable to
calculate the 95% confidence interval taking into account the effect of clustering. Discuss your findings. Is there evidence to contradict
the hypothesis of random selection ("detection") of a human energy field?