9: Inference About Proportions 10/8/06

Review Questions

  1. What is a binary variable? 
  2. Quantitative variables are summarized with sums and averages. Categorical variables are summarized with counts and _____________.
  3. What symbol denotes the sample proportion (i.e., the statistic)? What symbol denotes the  population proportion (i.e., parameter)? 
  4. What is the name of the probability distribution that governs the number of successes in n independent Bernoulli trials?
  5. What probability distribution can be used for binomials when n is large? 
  6. What factors determines the sample size requirements for estimating a proportion?
  7. In testing H0: pp0, where does the value of  p0 come from?

Exercises

9.1 Incidence of improvement. Of 75 patients, 20 respond to treatment.

(A) Calculate the incidence proportion in the sample. 
(B) Calculate a 95% confidence interval for the incidence in the population. (Use the plus-four method.) 
(C) Explain the meaning of the confidence interval to a person who has little statistical training. 
(D) How large a sample is needed to reduce the margin of error of the 95% confidence interval to plus or minus 0.05?

9.2 Campus smoking survey. Using saliva samples from student  participants, a study intends to estimate the prevalence of smoking on campus. Presence of continine in saliva indicates the student is a smoker. (Cotinine is a by-product of tobacco that is thought to be a reliable indicator of tobacco use.)  Initially, 200 students are selected at random from the university's roosters to participate in survey. Only 95 of the 200 potential participants respond to the survey. Of the 95 participants, 23 test positive for cotinine in their saliva.  

(A) Describe the population to which inferences will be made. 
(B) What factors would lead you to doubt that the validity of the sample?
(C) Let us assume for the sake of practice that the sample is a reasonable approximation of a SRS of students on campus, i.e., that the sample is not biased. Estimate the prevalence of tobacco use 
(D) Calculate a 95% confidence interval for the prevalence in the campus population.
(E) How large a sample is needed to reduce the margin of error in the survey to 0.025? 

9.3 BRCA1 mutations in familial breast cancer cases (Couch et al., 1997). Of 169 women having breast cancer and a  familial risk factor (i.e., having a first-degree family member with breast cancer), 27 had an inherited BRCA1 mutation. Estimate the proportion of familial risk fact breast cancer cases who have a BRCA1 mutation. Include a 95% confidence interval for the proportion. 

9.4 Therapeutic touch. Proponents of therapeutic touch postulate that each person has a human energy field (HEF) that can be felt by those trained to recognize HEF-related perceptions. An experiment tested with therapeutic touch practitioners could correctly identify whether the HEF of an unseen hand hovered over their left or right hand. The correct hand was identified in 123 (44%) of 280 tries (Rosa et al., 1998). Calculate a 95% confidence interval for the proportion of corrected guesses. Are the results of the experiment consistent with random guessing?   [Note to advanced users: We have ignored the fact that the sample is clustered. However, the design effect of the study was small (def = 1.12), and does not materially effect the interpretation of results.]

9.5 Risk factor X. A SRS of 120 individuals identified16 individuals with risk factor X. Estimate the prevalence of risk factor X in the population with 95% confidence. 

9.6 Nurse anesthetists. Severn (7) malignancies were noted in a  study of 525 nurse anesthetists over a  10 year periods. Calculate a 95% confidence interval for the ten-year incidence proportion of malignancies. 

9.7 Insulation workers. Twenty-six (26) cancer deaths were observed in a cohort of 556 insulation workers. Based on national statistics, a cohort of this size and age distribution was expected to have 14.4 cases during the period of study. In other terms, the expected proportion p0 = 14.4 / 556 = 0.02590. Test whether the observed number of cases is significantly greater than expected. Show all hypothesis testing steps. [Note: We are testing whether = 26 / 556 = 0.04676 is significantly different than 0.0259.]

9.8 Kidney cancer survival. An oncologist treats 40 kidney cancer cases. Sixteen (16) of the 40 cases survive at least five years. Historically, 1 in 5 (0.20) cases were expected to survive 5-years or more. Test whether there has been a significant improvement in survival. [We are testing whether = 16 / 40 = 0.40 is significantly different than 0.20.] Show all hypothesis testing steps.

9.9 Leukemia gender preference. A sample of 262 leukemia cases was made up of 150 male cases and 112 female cases. Does this provide evidence for a gender preference? [The observed proportion of cases who are male = 150 / (150 + 112) = 0.5725. Test H0: p = 0.5].

9.10 AIDS-related risk factor (blood transfusion or high-risk partner). A national random sample of 2673 heterosexual men found 5 respondents had received a blood transfusion and had a sexual partner from a group at high risk of AIDS. Provide a 95% confidence interval for this combination of risk factors in the population. (Catania et al., 1992; Moore, 2004, p. 480.)  

9.11 Sample size. You are planning a study that intends to estimate a population proportion with 95% confidence. How many individuals do you need to study to achieve the margins of error listed below?  Reasonable estimates for the population proportions are not available before study, so you assume p* = 0.50 to ensure adequate precision.

(A) Margin or error = 10%
(B) Margin or error = 5%
(C) Margin or error = 2.5%

9.12 Perinatal growth failure. Failure to grow normally during the first year of life is referred to as "perinatal growth failure." Perinatal growth failure occurred in 33 of 249 very-low birth weight babies (Hack et al., 1991). 

(A) Calculate a 90% confidence interval for the proportion of very-low birth weight babies that will exhibit perinatal growth failure. (Assume the sample is a random representation of the population of very-low-birth weight infants.)
(B) Calculate a 95% confidence interval for this parameter.
(C) Advanced question, exact binomial test required: Among the 33 perinatal growth failure cases, 8 had very low intelligence test scores (under 70) when they reached 8-years of age. In the normal birth weight babies, we'd expect 2.5% of the population to exhibit intelligence test scores this low. This means that 8 cases observed, while only 0.025 × 33 = 0.825 were  expected. Is the observed proportion (8 of 33) significantly greater than expected? Perform a one-side exact binomial test to address this problem. [The sample is too small to use a z test since np0q0 = (33)(0.025)(0.975) = 0.804.] 

9.13 Alternative medicine. According to a New York Times article, a nationwide telephone survey conducted for Landmark Healthcare Inc., a managed alternative care company in Sacramento, Calif., found that of 1,500 adults interviewed, 660 said they would use alternative medicine if traditional medical care failed to produce the desired results. Calculate a 95% confidence interval for the proportion of adults who would use alternative medicine if traditional medical care failed. 

9.14 Prevalence of binge drinking in U.S. colleges. Alcohol abuse is a serious health problem on college campuses. A nationwide survey of student at 140 four-year colleges in the U.S. defined "frequent binge drinking" as "having five or more drinks in a row three or more times in the past two week period." Of the 17,096 students surveyed, 3,314 met this criterion. Assume the data represent a SRS of 4-year colleges (Wechsler, 1994; Moore & McCabe, 2006, p. 537). 

(A) Calculate the prevalence of frequent binge drinking. Including a 95% confidence interval for the estimate.  
(B) Data were self-reported and the response rate for the survey was 69%. Consider how these factors might influence the estimates you just calculated.

9.15 Cerebral tumors -- same side as cell phone use? In a case-control study on cerebral tumors and cell phone use, tumors occurred more frequently on the same side of the head where cellular telephones had been used in 26 of 41 cases (Muskat et al., 2000). Test the hypothesis of an equal distribution of contralaterial and ipsilateral tumors in the population. (Test H0: p = ½).

9.16 Temporal lobe tumors -- opposite side of cell phone? In the same study addressed in the prior problem, cases with temporal lobe cancer had a a greater proportion of tumors occurred in the contralateral than ipsilateral side: 9 vs 5 cases. Provide a P-value for this problem.

9.17 Drove when drinking alcohol. The Youth Risk Behavior Surveillance (YRBS) system monitors health behaviors in youth and young adults in the United States. Results from the 2005 survey indicated that almost 10% of respondents had driven a car or other vehicle when they had been drinking alcohol (Eaton et al., 2006; http://www.cdc.gov/mmwr/PDF/SS/SS5505.pdf). This was a large survey, so it had a small margin of error (about ±1%). Recall that the margin of error quantifies only the random error associated with sampling; it does not address systematic sources of error that may result from non-response (a type of selection bias) and misinformation (information bias). In practice, non-random errors are of more practical concern than random sampling error. 

(A) The authors of the YRBS were concerned about selection biases when they reported an overall response rate of about 67%. In what situations will non-response cause bias?
(B) The investigators noted that a separate validation study was completed in which fair to good repeatability was documented when questionnaires were given on separate occasions (Brener et al., 2002). Are repeatable responses necessarily valid? What types of factors could influence the accuracy of responses?

 

Key to Odd Numbered Exercises                  Key to Even Numbered Exercises (may not be posted)