9 Key Odd

Review Questions

  1. A binary variable is a categorical response with two possible outcomes (success or failure). Synonyms include dichotomous variable and 0/1 variable.
  2. proportions
  3. Sample proportion = . Population proportion (parameter) = p.
  4. The binomial distribution. 
  5. The Normal distribution.
  6. (a) desired margin of error d  (b) population proportion p  
  7. From the imagination of the research or the research question. 

Exercises

9.1 Incidence of improvement

(A) Calculate the proportion in the sample.  = 20 / 75 = 0.2667 (about 27%)

(B) Calculate a 95% confidence interval for the incidence in the population. n~ = 79, X~ = 22, p~ = 0.2785, SEp~ = sqrt[(0.2785)(0.7215)/(79)] = 0.0504; 95% CI for p = 0.2785 ± (1.96)(0.0504) = 0.2785 ± 0.0988 = (0.1797 to 0.3773) or (18% to 37%).

(C) Explain the meaning of the confidence interval to a person who has little statistical training. In explaining this confidence interval to a lay person, you should be clear that the sample proportion is only a rough estimate of the true proportion in the population. The confidence interval gives us a better idea of this true value. In this case, we can say with 95% confidence that the true proportion is between 18% to 37%.

(D) How large a sample is needed to reduce the margin of error of the 95% confidence interval to plus or minus 0.05? In using the proportion from the initial study as p*, we calculate n = (1.962)(0.267)(0.733) / (0.052) =  300.7. Round this up to the next integer, to use n = 301. 

9.3 BRCA1 mutation: = 0.160. 95% CI for p = 0.112 to 0.222 (by Wilson's method, which should be similar to the plus-four results)

9.5 Risk factor X: = 16 / 120 = 0.1333 ( about 13%); 95% confidence interval for p = 0.084 to 0.206 (via Wilson's method, which should be similar to the plus-four method)

9.7 Insulation workers Test whether the observed number of cases is significantly greater than expected. Show all hypothesis testing steps.

Check the npq rule: np0q0 = (556)(0.0259)(1 - 0.0259) = 14.0. Then...

(Hypotheses) H0: p = 0.0259 vs. H1: p not = 0.0259 
(Test statistic) SE = sqrt[(0.02590)(1 - 0.02590) / (556)] = 0.00674;  = 26 / 556 = 0.04676; zstat = (- p0) / SE = (0.04676 - 0.0259) / (0.00674) = 3.10
(P-value)  P = 0.0020
(Significance statement). The evidence against H0 is highly significant.

9.9 Leukemia gender preference --The z test can be use since: np0q0 = (262)(0.5)(0.5) = 65.5. 

Check the npq rule: np0q0 = (262)(0.5)(1 - 0.5) = 65.5

(Hypotheses) H0: p = 0.5 vs. H1: p 0.5
(Test statistic): SE = sqrt[(0.5)(0.5)/(262)] = 0.03089; zstat = (0.5725 - 0.5) / 0.03089 = 2.34
(P-value) P = 0.0192 indicating good evidence against H0
(Significant statement): Data provide significant evidence against H0

9.11 Sample size requirements   

(A) n = (1.962)(0.5)(0.5) / 0.12 = 96.04. Round this up to n = 97 to ensure adequate precision.
(B) n = (1.962)(0.5)(0.5) / 0.052 = 384.16. Round up to n = 385.
(C) n = (1.962)(0.5)(0.5) / 0.0252 = 1536.64, so use n = 1537.

9.13 Alternative medicine. 95% confidence interval for parameter p = 0.415 to 0.465 (42% to 47%) by the Wilson score method.

9.15 Cerebral tumors -- same side as cell phone use?  p-hat = 26 / 41 = 0.6341, q-hat = 1 - 0.6341 = 0.3659, SE (assuming p = 0.5) is equal to sqrt(.5*.5/41) = 0.078087. zstat = (0.6351 - .5)/ 0.078087 = 1.718;  one-sided P = Pr(Z >= 1.718) = 0.043; two-sided P = 2 * 0.43 = 0.086. 

Notes for advanced users: 

9.17 Drove when drinking alcohol

(A) The extent to non-response biased the survey depends on if those who refuse to participate differed from those who participated in the study. 
(B) No. A
repeatable response can be repeatedly wrong. Some things to consider regarding accuracy: Did the respondents understand the question? Were they fearful of telling the truth about their behaviors? Were they trying to please the people administering the survey? (And so on.)