5: Introduction to Estimation 3/15/07

Review Questions

  1. What is statistical inference?
  2. Name the two forms of statistical inference.
  3. Name the two forms of estimation. 
  4. Compare and contrast the goals of estimation and hypothesis testing.
  5. Although parameters and estimators are related, they are not the same. List ways they differ.
  6. What does it mean when we say that x-bar is an unbiased estimator of μ?
  7. Fill in the blanks: A particular random sample of n observations can be used to calculate a sample mean. We can determine the characteristics of the distribution of means derived by other samples of the same size taken from the same populations without taking additional samples. The theorem that postulates that this distribution will tend to be Normal is called the _____________ __________ theorem. The sampling distribution of the mean will be centered on the __________ __________. The standard deviation of the sampling distribution of means is called its  ____________ ___________ and is equal to the standard deviation of the measurement divided by the __________ __________ of the sample size.
  8. When the sampling distribution of x-bar is Normal, 95% of the sample means will fall within 1.96 ___________ ___________ of µ.
  9. T/F The "square root law" says that the SEM is inversely related to the square root of the  of the sample size.
  10. Match the definitions below with the following terms: alpha, statistical inference, parameter, confidence interval, sampling distribution of the mean, standard error of the mean

    (a) a numeric characteristic of a population 

    (b) the act of generalizing from a sample to a population with calculated degree of certainty 

    (c) the hypothetical frequency distribution of all possible sample means based on the same sample sizes from the same population

    (d) the chance researcher is willing to take in not capturing the parameter 

    (e) the standard deviation of the sampling distribution of the mean 

    (f) an interval that is created with known likelihood of capturing a  parameter 

  11. What percentage of 95% confidence intervals for µ will fail to capture the population mean?
  12. What percentage of (1-alpha)100% confidence intervals for µ will fail to capture the population mean?
  13. The confidence interval for the mean seeks to capture the _____________  mean [M/C: (a) sample (b) population].
  14. A confidence interval is 3 ± 1.2. The 3 in this equation is the _________ ____ __________ [3 words].

Exercises, Part A (Basics, sampling distributions, z procedures) 

5A.1 Parameter or estimate? Say whether each of the boldface numbers is a parameter or estimate. 

 

(A) There are about 18,800 new cases of female breast cancer in the state of California in a given year. An oncologist studies survival in 1225 newly diagnosed cases and finds that survival varies greatly by stage of diagnosis. The average seven-year survival rates for Stage I breast cancer survival rate is 92%. The Stage II survival rate is 71%, Stage III survival rate is 39%, and the Stage IV survival rate is 11%.(Numbers are fictitious but are realistic.)

(B) A review of divorce records for a county in Connecticut for the year 2000 indicates that the marriages that end in divorce lasts an average of 72 months. If inferences are to be restricted to that year and population, is the number 72 a parameter or estimate? 

 

5A.2 Parameter or estimate? Say whether each of the boldface numbers below is a parameter or a estimate.

 

(A) A review of 168 hospital discharge summaries in 2006 reveals that 20% of patients were uninsured that fiscal year. (Assume this hospital treats thousands of patients per year.)
(B) Data for a census track Standard Metropolitan Area (SMA) indicates that 12% of the population is African-American. In a survey using a random-digit-dialing machine, we find that 8% of respondents were African-American.
(C) We want to determine the average cost patients pay for a particular medication. Ten online pharmacies reveal an average cost of $31.20 with a standard deviation of $7.75 for a one-month supply. Ten community pharmacies show an average cost of $33.18 with standard deviation $7.88.

 

5A.3 Very tiny population. A tiny finite population consists of the following values: {1, 3, 5, 7, 9}. This population has mean µ = 5 and standard deviation s = 2.8. 


(A) List of all possible unique samples of n = 2 from this population (T
here are 5C2 = 10 such samples.) Calculate the mean of each sample. 
(B) Construct a stemplot of the 10 sample means. This is a rudimentary sampling distribution of a mean (SDM) based on n = 2 from this population. Is the SDM more or less Normal than the population? Is the mean of the SDM equal to, less than, or greater than the mean of the population? Is the spread of the SDM equal to, less than, or greater than the standard deviation of the population? 

 

5A.4 A survey of health problems. A survey takes a simple random sample of 500 people from a town of 55,000. On the average, there were 2.30 health problems per person (standard deviation = 1.65). Say whether each of the following statements is true or false. Explain your reasoning in each instance.

 

(A) The standard deviation of the sampling distribution of the mean (SDM) is 0.0740.
(B) The 95% confidence interval for the average number of health problems in the sample is (2.16, 2.44).
(C) The 95% confidence interval for the average number of health problems in the town is (2.16, 2.44).
(D) It is reasonable to say that the number of health problems in the population is Normal.
(E) It is reasonable to say that the sampling distribution of the mean (SDM) is Normal.

 

5A.5 Serum cholesterol levels in undergraduate men. Suppose the distribution of cholesterol levels in undergraduate men is approximately Normal with mean µ = 190 mg/dl and standard deviation s =  40 mg/dl. 

 

(A) What is the probability of selecting someone at random from this population who has a cholesterol value less than 180?

(B) Suppose you take a SRS of n = 49 undergraduate men from this population and calculate . What is the standard deviation (error) of ?

(C) What is the probability that your (based on a SRS of n = 49) is less than 180?

 

5A.6 Lab measurements. Measurement of water quality samples have standard deviation s = 10. A lab assistant takes 4 measurements and calculates

(A) Explain the advantage of reporting the average of several measurements rather than using the result of a single measurement. 

(B) Calculate the SE of when n = 4.

(C) How many times must the assistant repeat the measurement to reduce the SE of  to 2? [Rearrange the formula SE = s / n to get n = (s / SE)2 Now determine the sample size required to get the desired standard error].


5A.7 Sampling behavior of a mean. Suppose you could take all possible samples of n = 30 from a Normal population with mean µ = 55.5 and standard deviation s =  4. Draw the sampling distribution of the mean (SDM) based on this sample and identify landmarks on the horizontal axis that are ±2 standard errors around the expected mean. [Note that SEM = 4 / 30 = 0.7303]. Would you be surprised to find a sample mean less than 54? Would you be surprised to find a sample mean that exceeds 56?

 

5A.8 SDM of test scores. Suppose you give a test to a 100 people. The scores vary according to a Normal distribution with mean µ = 250 and standard deviation s = 50. Calculate the standard deviation (error) of the mean. Sketch the SDM and shade the region under the curve corresponding to ± 2 standard errors around µ. What value demarcates the lower tail ? What demarcates the upper tail?

 

5A.9 Calcium-channel blocker cost. A survey of 30 pharmacies found that the average cost of a month's supply of a calcium channel-blocker drug was $33. The margin of error for 95% confidence was $2.50. Calculate the 95% confidence interval for the mean price of the drug. What does it mean to say that we have 95% confidence in this interval?

 

5A.10 Misinterpreting the confidence interval. A pharmacist reads that a 95% confidence interval for the average price for a month's supply for a particular drug is $30.50 to $35.50. Asked to explain the meaning of this, the pharmacist states "95% of all pharmacies sell the drug for between $30.50 and $35.50." Is the pharmacist correct? Explain your response. 

 

5A.11 Graduate student age. The age distribution of students in a graduate program is approximately Normal with unknown mean µ and standard deviation sigma = 5. You sample 24 individuals from this population and calculate = 25.0. Calculate the 95% confidence interval for µ based on these data. Interpret your interval.

 

5A.12 Muscle strength scores. A physical therapist studying muscular strength is willing to assume muscle strength scores are Normal with a standard deviation 12. A sample of 15 individuals demonstrates a mean muscular strength score of 84.3. Calculate a 95% confidence interval for μ and then explain what it means to say that you have 95% confidence in this interval. [Similar to Daniel, 1999, p. 157, but for 95% confidence.]

 

5A.13 Graduate student age. Calculate the 99% confidence interval for the data in exercise 5.11. Why is the 99% confidence intervals longer than the 95% confidence interval? 

 

5A.14 Muscle strength scores. Calculate the 90% confidence interval for the data in exercise 5.12. Why is this interval shorter than the 95% confidence interval?

 

5A.15 Antigen titer. A vaccine manufacturer analyzes samples a production batch of vaccine to check up on the concentration of antigen in a product. Immunologic analyses are not perfect, so she repeats measurements on the same batch getting slightly different results each time. The public health scientist assumes that repeated measurements will vary according to a Normal distribution with mean µ and s =  0.070. (The standard deviation is assumed to be a characteristic of the titering technique, and reported by the manufacturer in the documentation with the kit used for the procedure.) Three (n = 3) measurements on one sample reveals the following titers: {17.40, 17.36, 17.45}. Calculate a 95% confidence interval for the true concentration µ. 

 

5A.16 Laboratory scale. The manufacturer of a laboratory scale with a digital readout claims the scale is accurate to 0.0015 of a gram. You read the fine print in the documentation that accompanies the scale and find that, by this, the manufacturer means that measurements have standard deviation s = 0.0015 grams. You are willing to assume measurements vary according to a Normal distribution with a mean µ that is equal to the true weight of the object. Two weightings of the same specimen produce weight of 24.31 and 24.34 grams, respectively. Calculate a 99% confidence interval for the true weight of the object. 

 

5A.17 SIDS. A study of 49 sudden infant death syndrome (SIDS) cases calculates a mean birth weight of 2998 grams. From a listing of all birth weight, it is known that the standard deviation s of this variable is 800 grams (data are fictitious but realistic). Assume this standard deviation applies to the population of SIDS cases. Calculate a 95% confidence interval for the mean µ birth weight of SIDS cases. Interpret your results. 

.

 5A.18 Birth weights. smoking mothers. Random samples of size n are selected from a population of birth weights of full term infants. The population standard deviation s = 2 pounds for this variable. Calculate 95% confidence intervals for µ based on s and ns for each of the following samples:

(A) n = 81,  = 6.2 pounds

(B) n = 36,  = 7.0 pounds

(C) n = 9,  = 5.8 pounds

(D) Determine the margin of error for each of the above estimates.

5A.19 Hemoglobin study, sample size requirement. Hemoglobin levels in 11-year old boys have a Normal distribution with unknown mean µ and s = 1.209 g/dl. How large a sample is needed to estimate µ with 95% confidence and a margin of error of 0.5?

 

5A.20 Sugar consumption survey, sample size requirement. A public health researcher is willing to assume (based on prior research) that the standard deviation of the weekly sugar consumption in children is 100 grams. How large a sample is needed to calculate a 95% confidence interval for µ so that its  margin of error is no greater than 10 grams?

 

Exercises, Part B (t procedures) 

5B.1 Blood pressure. A study found a mean systolic blood pressure = 124.6 mm Hg in 35 individuals. The standard deviation s = 10.3 mm Hg.

(A) Calculate the estimated standard the error of the mean.
(B) How many people would you need to study to decrease the standard error of the mean to 1 mm Hg? [Rearrange se = s /
n to solve for n. Then plug-in values for se and s.]

 

5B.2 Published report A study published in the American Journal of Public Health (Langenberg, 2005) addressed the statistical relation  between tall stature, cardiovascular mortality, and employment grade. Results were reported in a table with the column heading “Mean Height, cm. (SE).” The table entry for “Stroke in the Low Employment Grade” was 173.2 (0.2) based on n = 1243. From this table, you are supposed to understand that x-bar = 173.2 and the standard error of the mean  = 0.2.  What is the standard deviation of the data in this sample? [Rearrange the formula for the se to solve for s. Then plug-in the values of n and se.]  

5B.3 t curve. Sketching and shading t curve helps bring home the distinction between the t quantiles, cumulative probabilities, and tail probabilities.  

(A) Sketch a t curve. (To the eye, this curve will look like a z curve ). Label the horizontal- axis with tick marks at 1-unit standard error intervals. 
(B) Use the t table to determine the value of t with 9 degrees of freedom and cumulative probability 0.90 (i.e., t9,.90). Shade the region under the curve to its right of this point. Notice that this right tail = 1 - 0.90 = 0.10 of the curve.
(C)
T tables do not include negative t values because it knows readers can use the symmetry of the curve to determine quantiles to the left of center. Use the symmetry of the t curve to determine the t quantile that cuts off the bottom 10% of the curve (i.e.,  t9,.10). Shade the region to the left of this point.
(D) What is the combined area of the shaded regions of the curve you just sketched? 

 

5B.4 t percentiles. Use your t table to determine the following t percentiles.

(A) t19,.95 [This is the t quantile with 19 degrees of freedom and cumulative probability 0.95.]
(B) t24,.975  
(C) t35,.975
(D) t674,.99 [A t distribution with this many degrees of freedom is nearly the same as a  z distribution; use the row in the t table for z.]
(E) t19,.05  [Use your knowledge of the symmetry of the t distribution to determine the mirror image of  t19,.95.]
(F) t19,.025  [This is the mirror image of t19,.975.]

 

5B.5 Approximating a probability with the "wedgie-technique." Sometimes you will need to determine the area under the curve to the right or left of a t quantile that does not appear in the body of the t table. For example, you may need to determine the area in the tail beyond a tstatistic of 2.65 with 8 degrees of freedom. Even though this t quantile does not appear in the table, you can still derive its approximate probability by bracketing it between landmarks that are listed in the t table. In this case, a tstatistic of 2.65 with 8 df is bracketed between t8,.975 (2.31) and t8,.99 (2.90). This shows it to have a cumulative probability that is a little bigger than 0.975 and a little smaller than 0.99. Sketch the t8 distribution curve showing the location of  t8,.975 and t8,.99 on its horizontal axis. Wedge 2.65 between these landmarks. What is the approximate size of the area under the curve to the right of 2.65 under this curve? [You may also use StaTable or an other package to determine the exact area under the curve beyond 2.65 on a t distribution with 8 degrees of freedom.] 

 

5B.6 Pr(T9 < -2.98). Use your t table or StaTable to determine the probability of seeing a t quantile with 9 df  that is less than -2.98? 

 

5B.7 t critical values for a confidence interval. You have a SRS of n = 28 individuals. What is the value of the t quantile (critical value) would you use to calculate a 95% confidence interval for µ?   

 

5B.8 t for confidence. You have a SRS of n = 28 and want to calculate a 90% confidence interval for µ. What t quantile do you use for your calculation? 

 

5B.9 Serum polyphenols and red wine consumption. Drinking moderate amounts of wine may reduce the risk of coronary artery disease in some individuals. One possible reason for this is that red wine contains polyphenols, and polyphenols help serum cholesterol profiles. In an experiment involving 9 men, the subjects drank half a bottle of red wine each day for two weeks (Nigdikar, 1998). Level of polyphenols in blood samples were  measured at the beginning and end of the experiment. Percent change in polyphenols levels are {3.5, 8.1, 7.4, 4.0, 0.7, 4.9, 8.4, 7.0, 5.5}. [Note: = 5.5 s = 2.517, n = 9.] Calculate a 95% confidence interval for the mean percent change in polyphenols associated with this amount of red wine consumption. 

 

5B.10 Calcium in sound teeth. A dental researcher measures the calcium content of sound teeth (% of tooth content that is calcium). A sample of 5 teeth shows the following values {33.4, 36.2, 34.8, 35.2, 35.5}.Provide a 99% confidence interval for the mean percent calcium content of sound teeth. [You may use your calculator to find the mean and standard deviation.] 

 

5B.11 Boy height. A SRS of n = 26 boys between the ages of 13 and 14 reveals a mean height of 63.8 inches with a standard deviation of 3.1 inches. Assume height in the population varies according to a Normal distribution. Calculate a 95% CI for the mean height of all boys in this age range.

 

5B.12 Vector control in an African village. A study of insect vector control in an African village found that the mean sprayable surface area of 100 houses was 249 square feet with standard deviation =  39.82 square feet. (Data are fictitious but realistic; see Osborn, 1979, p. 6 for full data set.) 
(A) Determine the 95% confidence interval for the mean sprayable surface of houses in the village.
(B)
Would it be correct to say that 95% of all the houses in the village have sprayable surfaces between the lower confidence limit and upper confidence limit? Explain your response.  

5B. 13. Respiratory function in furniture workers Forced expiratory volume (FEV) is a measure of respiratory health in which you  forcibly blow through a tube. The rate of air expelled (liters per second) is measured as an index of lung function. FEV in seven workers at a furniture manufacturing plant are {3.94, 1.47, 2.06, 2.36, 3.74, 3.43, 3.78}. Calculate a 90% CI for the mean FEV for the population of furniture workers.  

5B. 14 COPD and skin fold thickness. Skin-fold thickness thicknesses (a general measure of overall body condition) taken at the triceps region of 40 healthy male controls averages 1.35 cm (standard deviation = 0.50 cm). In 32 men with chronic obstructive pulmonary disease, skin thickness at the triceps region averaged 0.92 cm (standard deviation = 0.40 cm). [Secondary source: Rosner, 1990, p. 177 and p. 185, originally from Arora & Rochester, 1984.] 

(A) Calculate 95% confidence intervals for the skin fold thickness in the healthy population. 
(B) Calculate 95% confidence intervals for the skin fold thickness in the population of men with chronic obstructive pulmonary disease. 
(C) Plot the above confidence intervals in side-by-side fashion on graph paper. Compare the intervals. Interpret your results. 

5B. 15 Body weight, high school girls. A SRS of body weight expressed as a percentage of ideal in 9 high school girls reveals: {114, 100, 104, 94, 114, 105, 103, 105, 96}. 

 

(A)  Plot the data as a stemplot using split-stem values. Are there any major departures from Normality? 
(B) Calculate a 95% confidence for population mean µ of this variable in the school. Show all work.
(C) What is the margin of error of your estimate? (Numerical value.)
(D)  How large a sample would be needed to reduce the margin of error of the 95% confidence interval down to 3?

5B.16 Treatment of scrapie (Tagliavini, 1997).  Scrapie is a prion disease similar in pathology to bovine spongiform encephalopathy (mad cow disease) and new variant Creutzfeldt-Jakob disease. In a trial of a substance used to treat scrapie in hamsters, 10 scrapie-infected hamsters chosen at random where treated with the substance and 10 scrapie infected hamsters were left untreated. The mean time before the appears of symptoms in the treated group (induction time) was 81.9 days (se = 2.2 days). What was the standard deviation in the control group. [Recall that se = s / n. Rearrange this formula to determine the standard deviation of the induction time.]

 

5B.17 This is a continuation of exercise 5B.16. The mean induction time in the control group was 102.8 days (se = 3.8 days). What was the standard deviation of induction times in this group?

 

5B.18. Therapeutic touch. Proponents of a complementary and alternative medical technique known as therapeutic touch claim that each person has a human energy field (HEF) that can be perceived by touch. Therapists especially trained to recognize HEF-related perceptions are said to be particularly adept. In an experiment that started out as a fourth-grade science fair project, therapeutic touch practitioners of varying experience were tested under blind conditions to see whether the could correctly identify whether the HEF of an unseen hand hovered over their left or right hand (Figure, right). Fifteen (15) therapeutic touch therapists underwent an initial set of 10 trials each (Rosa et al., 1998). If HEF perception through therapeutic tough was possible, the therapists should have each been able to detect the experimenter's hand in 10 (100%) of 10 trials. Chance alone would produce a mean score of 5 (50%). The 15 touch therapists had mean score was 4.67 (standard deviation 1.74).  Calculate a 95% confidence interval for the number of correct guesses based. Is the confidence interval compatible with random guessing? Is it compatible with, say, the ability to detect 3 of 4 HEFs?

 

 Key to Odd Numbered Problems                            Key to Even Numbered Problems (may not be posted)