4A: Probability concepts and binomial distributions (Key Odd)

Review Questions

  1. All are based on the expected proportions
  2. A random variables is a numerical value that takes on different values depending on chance. 
  3. The two types of random variables are discrete random variables and continuous random variables. (A binomial random variable is an example of a discrete random variables. A Normal random variable is an example of a continuous random variable.) 
  4. (a) discrete (b) discrete (c) continuous
  5. A Bernoulli trial is a random event in which only two possible outcomes can occur. (One event is denoted "success" and the other is denoted "failure.")
  6. n = number of independent Bernoulli trials; p = probability of success for each Bernoulli trial
  7. 10C2 = 10! / (2! · 8!) = (10 · 9 · 8!) / (2! · 8!) = (10 · 9) / (2  · 1) =  45
  8. 1
  9. 0! = 1 (by definition)

Exercises

4A.1 Probability of survival. Explain the meaning of this probability estimate to the patient in terms he or she will understand. This means that, in the long run, we expect 80% of similar patients to  survive at least five years, and 20% not to survive. 

4A.3 February birthdays

(A) Pr(Feb 28) = 4 / (365 + 365 + 365 + 366) = 4 / 1461 = 0.0027378 [You must take into account that every fourth year is a leap-year] 
(B) Pr(Feb 29) = 1 / (365 + 365 + 365 + 366) = 1 / 1461 = 0.0006845 [Feb 29 occurs once every 4 years].
(C) Pr(Feb 28 or Feb29) =  5 / (365 + 365 + 365 + 366) = 0.003422313 [or you can simply add Pr(Feb 28) + Pr(Feb 29) since these events are disjoint]

4A.5 Sampling a small finite population, N = 26.

(A) 1 in 26 = 0.0385
(B) 1 in 26 = 0.0385
(C) 0 in 25 = 0.0000

4A.7 Coin flipping. 

(A) ... estimate the probability of tails via the observed proportion. Observed proportions will vary.
(B) Why do most experiments fail to derive the expected number of tails based on a probability of 0.50?  This experiment has only 30 trials, so was not given enough time to converge on the true probability.  Probability provides statements about what will happen in the long term, but not in the short run.

4A.9 Breast cancer 

(A) Build the probability mass function for the number of women in the sample who will ultimately develop breast cancer in the sample of n = 3. 
Pr(X = 0) = (3C0)(0.10)(0.93-0) = (1)(1)(0.7290) = 0.7290
Pr(X = 1) = (3C1)(0.11)(0.93-1) = (3)(0.1)(0.8100) = 0.2430
Pr(X = 2) = (3C2)(0.12)(0.93-2) = (3)(0.01)(0.90) = 0.0270
Pr(X = 3) = (3C3)(0.13)(0.93-3) = (1)(0.001)(1) =  0.0010
This is the pmf for X~b(3, 0.1)

(B) How likely would it be to find a sample in which all three develop breast cancer? 0.0010 (see calculation above) Provide three different explanations for such an observation. 

  1. (Chance): This is a one in a thousand occurrence.  
  2. (Assumption about p is incorrect): The model assumes p = 0.1. If risk was greater in the population, then p would actually be greater than 0.1 and the model would be a poor predictor.
  3. (The sample is not random): The binomial model assumes independence (random sampling). This means the data must be a SRS of the population. If the sample is not a SRS, all bets are off.  

4A.11 Childhood asthma

(A) Use statistical notation ... X~b(20, .05)
(B) What is the probability that no children in the sample will have asthma? Pr(X = 0) = (20C0)(.050)(.9520-0) = (1)(1)(0.3584) =  0.3585
(C) What is the probability one child in the sample will have asthma? Pr(X = 1) = (20C1)(.051)(.9520-1) =(20)(.05)(.3774) = 0.3774
(D) What is the probability one or fewer children will have asthma? Pr(X 1) = Pr(X = 0) + Pr(X = 1) = 0.3585 + 0.3774 = 0.7359 
(E) What is the probability at least 2 will have asthma? Pr(X 2) = 1 - Pr(X 1) = 1 -  0.7359  = 0.2641

4A.13 All ten. Recognize the outcome as a binomial random variable with n = 10 and p = 0.95. The probability of seeing 10 of 10 for this binomial = Pr(X = 10) = (10C10)(0.9510)(0.050) = (1)(0.5987)(1) = 0.5987. [Note: When events are independent, their joint probability is determined by their product. Therefore, the probability of 10 events in a row =  0.9510 = 0.5987.] 

4A.15 Prevalence 77%. Pr(X ≥ 9) = Pr(X = 9) + Pr(X = 10) = 0.2156 + 0.0714= 0.2871

4A.17 X~b(3, 0.05). Pr(X ≥ 2) = Pr(X = 2) + Pr(X = 3) = 0.0071 + 0.0001 = 0.0072