(a) a numeric characteristic of a population
(b) the act of generalizing from a sample to a population with calculated degree of certainty
(c) the hypothetical frequency distribution of all possible sample means based on the same sample sizes from the same population
(d) the chance researcher is willing to take in not capturing the parameter
(e) the standard deviation of the sampling distribution of the mean
(f) an interval that is created with known likelihood of capturing a parameter
5A.1 Parameter or estimate? Say whether each of the boldface numbers is a parameter or estimate.
(A) There are about 18,800 new cases of female breast cancer in the state of California in a given year. An oncologist studies survival in 1225 newly diagnosed cases and finds that survival varies greatly by stage of diagnosis. The average seven-year survival rates for Stage I breast cancer survival rate is 92%. The Stage II survival rate is 71%, Stage III survival rate is 39%, and the Stage IV survival rate is 11%.(Numbers are fictitious but are realistic.)
(B) A review of divorce records for a county in Connecticut for the year 2000 indicates that the marriages that end in divorce lasts an average of 72 months. If inferences are to be restricted to that year and population, is the number 72 a parameter or estimate?
5A.2 Parameter or estimate? Say whether each of the boldface numbers below is a parameter or a estimate.
(A) A review of 168 hospital discharge summaries
in 2006 reveals that 20% of patients
were uninsured that fiscal year. (Assume this hospital treats thousands of patients
per year.)
(B) Data for a census track Standard Metropolitan Area
(SMA) indicates that 12% of the population is African-American. In a
survey using a random-digit-dialing machine, we find that 8% of respondents were African-American.
(C) We want to determine the average cost
patients pay for a particular medication. Ten online pharmacies reveal an average cost of $31.20
with a standard deviation of $7.75 for a one-month supply. Ten community pharmacies
show an average cost of $33.18 with standard
deviation $7.88.
5A.3 Very tiny population. A tiny finite population consists of the following values: {1, 3, 5, 7, 9}. This population has mean µ = 5 and standard deviation s = 2.8.
(A) List of all possible unique samples of n = 2 from
this population (There are 5C2 = 10 such samples.) Calculate the
mean of each sample.
(B) Construct a stemplot of the 10
sample means. This is a rudimentary sampling distribution of a mean (SDM) based
on n = 2 from this population. Is
the SDM more or less Normal than the population? Is the mean of the SDM equal
to, less than, or greater than the mean of the population? Is the spread of the
SDM equal to, less than, or greater than the standard deviation of the
population?
5A.4 A survey of health problems. A survey takes a simple random sample of 500 people from a town of 55,000. On the average, there were 2.30 health problems per person (standard deviation = 1.65). Say whether each of the following statements is true or false. Explain your reasoning in each instance.
(A) The standard deviation of the sampling distribution of the
mean (SDM) is
0.0740.
(B) The 95% confidence interval for the average number of
health problems in the sample is (2.16, 2.44).
(C) The 95% confidence interval for the average number of
health problems in the town is (2.16, 2.44).
(D) It is reasonable to say that the number of health problems in the
population is Normal.
(E) It is reasonable to say that the sampling distribution of the mean (SDM) is Normal.
5A.5 Serum cholesterol levels in undergraduate men. Suppose the distribution of cholesterol levels in undergraduate men is approximately Normal with mean µ = 190 mg/dl and standard deviation s = 40 mg/dl.
(A) What is the probability of selecting someone at random from this population who has a cholesterol value less than 180?
(B) Suppose you take a SRS of n = 49 undergraduate men
from this population and calculate
. What is the standard deviation (error) of
?
(C) What is the probability that your
(based on a SRS of n = 49) is less than 180?
5A.6 Lab measurements. Measurement of water quality samples have standard deviation s = 10. A lab
assistant takes 4 measurements and calculates
.
(A) Explain the advantage of reporting the average of several measurements rather than using the result of a single measurement.
(B) Calculate the SE of
when n = 4.
(C) How many times must the assistant repeat the measurement to
reduce the SE of
to 2? [Rearrange the formula SE = s /
n
to get n = (s / SE)2
Now determine the sample size required to get the desired standard error].
5A.7 Sampling behavior of a mean. Suppose
you could take all possible samples of n = 30 from a
Normal population with mean µ = 55.5 and standard deviation s
= 4. Draw the
sampling distribution of the mean (SDM) based on this sample and identify landmarks on the
horizontal axis that are ±2 standard errors around the expected mean. [Note
that SEM = 4 /
30 = 0.7303]. Would you be
surprised to find a sample mean less than 54? Would you
be surprised to find a sample mean that exceeds 56?
5A.8 SDM of test scores. Suppose you give a test to a 100 people. The scores vary according to a Normal distribution with mean µ = 250 and standard deviation s = 50. Calculate the standard deviation (error) of the mean. Sketch the SDM and shade the region under the curve corresponding to ± 2 standard errors around µ. What value demarcates the lower tail ? What demarcates the upper tail?
5A.9 Calcium-channel blocker cost. A survey of 30 pharmacies found that the average cost of a month's supply of a calcium channel-blocker drug was $33. The margin of error for 95% confidence was $2.50. Calculate the 95% confidence interval for the mean price of the drug. What does it mean to say that we have 95% confidence in this interval?
5A.10 Misinterpreting the confidence interval. A pharmacist reads that a 95% confidence interval for the average price for a month's supply for a particular drug is $30.50 to $35.50. Asked to explain the meaning of this, the pharmacist states "95% of all pharmacies sell the drug for between $30.50 and $35.50." Is the pharmacist correct? Explain your response.
5A.11 Graduate student age. The
age distribution of students in a graduate program is approximately Normal with unknown mean µ
and standard
deviation sigma = 5. You sample 24 individuals from this population and calculate
= 25.0. Calculate the 95% confidence interval for µ based on these data.
Interpret your interval.
5A.12 Muscle strength scores. A physical therapist studying muscular strength is willing to assume muscle strength scores are Normal with a standard deviation 12. A sample of 15 individuals demonstrates a mean muscular strength score of 84.3. Calculate a 95% confidence interval for μ and then explain what it means to say that you have 95% confidence in this interval. [Similar to Daniel, 1999, p. 157, but for 95% confidence.]
5A.13 Graduate student age. Calculate the 99% confidence interval for the data in exercise 5.11. Why is the 99% confidence intervals longer than the 95% confidence interval?
5A.14 Muscle strength scores. Calculate the 90% confidence interval for the data in exercise 5.12. Why is this interval shorter than the 95% confidence interval?
5A.15 Antigen titer. A vaccine manufacturer analyzes samples a production batch of vaccine to check up on the concentration of antigen in a product. Immunologic analyses are not perfect, so she repeats measurements on the same batch getting slightly different results each time. The public health scientist assumes that repeated measurements will vary according to a Normal distribution with mean µ and s = 0.070. (The standard deviation is assumed to be a characteristic of the titering technique, and reported by the manufacturer in the documentation with the kit used for the procedure.) Three (n = 3) measurements on one sample reveals the following titers: {17.40, 17.36, 17.45}. Calculate a 95% confidence interval for the true concentration µ.
5A.16 Laboratory scale. The manufacturer of a laboratory scale with a digital readout claims the scale is accurate to 0.0015 of a gram. You read the fine print in the documentation that accompanies the scale and find that, by this, the manufacturer means that measurements have standard deviation s = 0.0015 grams. You are willing to assume measurements vary according to a Normal distribution with a mean µ that is equal to the true weight of the object. Two weightings of the same specimen produce weight of 24.31 and 24.34 grams, respectively. Calculate a 99% confidence interval for the true weight of the object.
5A.17 SIDS. A study of 49 sudden infant death syndrome (SIDS) cases calculates a mean birth weight of 2998 grams. From a listing of all birth weight, it is known that the standard deviation s of this variable is 800 grams (data are fictitious but realistic). Assume this standard deviation applies to the population of SIDS cases. Calculate a 95% confidence interval for the mean µ birth weight of SIDS cases. Interpret your results.
.
5A.18 Birth weights. smoking mothers. Random samples of size n are
selected from a population of birth weights of full term infants. The population standard deviation s
= 2 pounds
for this variable.
Calculate 95% confidence intervals for µ based on
s
and ns for each
of the following samples:
(A) n = 81,
= 6.2 pounds
(B) n = 36,
= 7.0 pounds
(C) n = 9,
= 5.8 pounds
(D) Determine the margin of error for each of the above estimates.
5A.19 Hemoglobin study, sample size requirement. Hemoglobin levels in 11-year old boys have a Normal distribution with unknown mean µ and s = 1.209 g/dl. How large a sample is needed to estimate µ with 95% confidence and a margin of error of 0.5?
5A.20 Sugar consumption survey, sample size requirement. A public health researcher is willing to assume (based on prior research) that the standard deviation of the weekly sugar consumption in children is 100 grams. How large a sample is needed to calculate a 95% confidence interval for µ so that its margin of error is no greater than 10 grams?
5B.1
Blood
pressure. A study found a mean systolic blood pressure
= 124.6 mm Hg in
35 individuals. The standard deviation s = 10.3 mm Hg.
(A) Calculate the estimated standard the error of the mean.
(B) How many people would you need to study to decrease the standard error
of the mean to 1 mm Hg? [Rearrange se = s /
n to solve for n.
Then plug-in values for se and s.]
5B.2
Published
report A study published in the American Journal of Public Health
(Langenberg, 2005) addressed the statistical relation between tall
stature, cardiovascular mortality, and employment grade. Results were
reported in a table with the column heading “Mean Height, cm. (SE).” The
table entry for “Stroke in the Low Employment Grade” was 173.2 (0.2)
based on n = 1243. From this
table, you are supposed to understand that x-bar = 173.2 and the
standard error of the mean = 0.2.
What is the standard deviation of the data in this sample? [Rearrange the
formula for the se to solve for s. Then plug-in the values of n
and se.]
5B.3 t curve. Sketching and shading t curve helps bring home the distinction between the t quantiles, cumulative probabilities, and tail probabilities.
(A) Sketch a t curve.
(To the eye, this curve will look
like a z curve ). Label the horizontal- axis with tick marks at 1-unit
standard error
intervals.
(B) Use the t
table to determine the value of t with 9 degrees of freedom
and cumulative probability 0.90 (i.e., t9,.90). Shade the region under the curve to its
right of this point. Notice that this right tail = 1 - 0.90 = 0.10 of
the curve.
(C) T tables do not include negative t values
because it knows readers can use the symmetry of the curve to determine
quantiles
to the left of center. Use the symmetry of the t curve to determine the t quantile that cuts off the bottom 10% of the
curve (i.e., t9,.10). Shade the region to the left of
this point.
(D) What is the combined area of the shaded regions of the
curve you just sketched?
5B.4 t percentiles. Use your t table to determine the following t percentiles.
(A) t19,.95 [This is the t quantile with 19 degrees of freedom
and cumulative probability 0.95.]
(B) t24,.975
(C) t35,.975
(D) t674,.99 [A t distribution with
this many degrees of
freedom is nearly the same as a z distribution; use the row
in the t table for z.]
(E) t19,.05 [Use
your knowledge of the symmetry of the t distribution to determine the
mirror image of t19,.95.]
(F) t19,.025 [This is the mirror image of t19,.975.]
5B.5 Approximating a probability with the "wedgie-technique." Sometimes you will need to determine the area under the curve to the right or left of a t quantile that does not appear in the body of the t table. For example, you may need to determine the area in the tail beyond a tstatistic of 2.65 with 8 degrees of freedom. Even though this t quantile does not appear in the table, you can still derive its approximate probability by bracketing it between landmarks that are listed in the t table. In this case, a tstatistic of 2.65 with 8 df is bracketed between t8,.975 (2.31) and t8,.99 (2.90). This shows it to have a cumulative probability that is a little bigger than 0.975 and a little smaller than 0.99. Sketch the t8 distribution curve showing the location of t8,.975 and t8,.99 on its horizontal axis. Wedge 2.65 between these landmarks. What is the approximate size of the area under the curve to the right of 2.65 under this curve? [You may also use StaTable or an other package to determine the exact area under the curve beyond 2.65 on a t distribution with 8 degrees of freedom.]
5B.6 Pr(T9 < -2.98). Use your t table or StaTable to determine the probability of seeing a t quantile with 9 df that is less than -2.98?
5B.7 t critical values for a confidence interval. You have a SRS of n = 28 individuals. What is the value of the t quantile (critical value) would you use to calculate a 95% confidence interval for µ?
5B.8 t for confidence. You have a SRS of n = 28 and want to calculate a 90% confidence interval for µ. What t quantile do you use for your calculation?
5B.9 Serum polyphenols and red wine
consumption. Drinking moderate
amounts of wine may reduce the risk of coronary artery disease in some
individuals. One possible reason for this is that
red wine contains polyphenols, and polyphenols help serum cholesterol profiles. In an experiment involving 9 men,
the subjects drank half a bottle
of red wine each day for two weeks (Nigdikar,
1998). Level of polyphenols in blood samples
were measured at the beginning and end of the experiment. Percent change
in polyphenols levels are {3.5,
8.1, 7.4, 4.0, 0.7, 4.9, 8.4, 7.0, 5.5}. [Note:
= 5.5 s = 2.517, n = 9.] Calculate a 95% confidence interval for the mean percent
change in polyphenols associated with this amount of red wine
consumption.
5B.10 Calcium in sound teeth. A dental researcher measures the calcium content of sound teeth (% of tooth content that is calcium). A sample of 5 teeth shows the following values {33.4, 36.2, 34.8, 35.2, 35.5}.Provide a 99% confidence interval for the mean percent calcium content of sound teeth. [You may use your calculator to find the mean and standard deviation.]
5B.11 Boy height. A SRS of n = 26 boys between the ages of 13 and 14 reveals a mean height of 63.8 inches with a standard deviation of 3.1 inches. Assume height in the population varies according to a Normal distribution. Calculate a 95% CI for the mean height of all boys in this age range.
5B.12 Vector control in an African
village. A study of insect vector control in
an African village found that the
mean sprayable surface area of 100 houses
was 249 square feet with standard deviation = 39.82 square feet. (Data
are fictitious but realistic; see Osborn, 1979, p. 6 for full data
set.)
(A) Determine the
95% confidence interval for the mean sprayable surface of houses in the village.
(B) Would it be correct to say that
95% of all the houses in the village have sprayable surfaces between the lower
confidence limit and upper confidence limit? Explain your response.
5B.
13. Respiratory
function in furniture workers.
Forced expiratory
volume (FEV) is a measure of respiratory health in which you forcibly blow through a tube.
The rate of air expelled (liters per second) is
measured as an index
of lung function. FEV
in seven
workers at a
furniture manufacturing plant are {3.94, 1.47, 2.06, 2.36, 3.74, 3.43, 3.78}.
Calculate a 90% CI for the mean FEV for the population of furniture
workers.
5B. 14 COPD and skin fold thickness. Skin-fold thickness thicknesses (a general measure of overall body condition) taken at the triceps region of 40 healthy male controls averages 1.35 cm (standard deviation = 0.50 cm). In 32 men with chronic obstructive pulmonary disease, skin thickness at the triceps region averaged 0.92 cm (standard deviation = 0.40 cm). [Secondary source: Rosner, 1990, p. 177 and p. 185, originally from Arora & Rochester, 1984.]
(A) Calculate 95% confidence intervals for the skin fold thickness in the healthy population.
(B) Calculate 95% confidence intervals for the skin fold thickness in the population of men with chronic obstructive pulmonary disease.
(C) Plot the above confidence intervals in side-by-side fashion on graph paper. Compare the intervals. Interpret your results.
5B. 15 Body weight, high school girls. A SRS of body weight expressed as a percentage of ideal in 9 high school girls reveals: {114, 100, 104, 94, 114, 105, 103, 105, 96}.
(A) Plot the data as a stemplot using split-stem values. Are there any major departures from Normality?
(B) Calculate a 95% confidence for population mean µ of this variable in the school. Show all work.
(C) What is the margin of error of your estimate? (Numerical value.)
(D) How large a sample would be needed to reduce the margin of error of the 95% confidence interval down to 3?
5B.16
Treatment of scrapie (Tagliavini,
1997). Scrapie is a prion disease
similar in pathology to bovine spongiform encephalopathy (mad cow disease)
and new variant Creutzfeldt-Jakob disease. In a trial of a substance used to
treat scrapie in hamsters, 10 scrapie-infected hamsters chosen at random
where treated with the substance and 10 scrapie infected hamsters were left
untreated. The mean time before the appears of symptoms in the treated group
(induction time) was 81.9 days (se = 2.2 days). What was the
standard deviation in the control group. [Recall that se =
s /
n.
Rearrange this
formula to determine the standard deviation of the induction time.]
5B.17 This is a continuation of exercise 5B.16. The mean induction time in the control group was 102.8 days (se = 3.8 days). What was the standard deviation of induction times in this group?
| 5B.18. Therapeutic touch. Proponents of a complementary and alternative medical technique known as therapeutic touch claim that each person has a human energy field (HEF) that can be perceived by touch. Therapists especially trained to recognize HEF-related perceptions are said to be particularly adept. In an experiment that started out as a fourth-grade science fair project, therapeutic touch practitioners of varying experience were tested under blind conditions to see whether the could correctly identify whether the HEF of an unseen hand hovered over their left or right hand (Figure, right). Fifteen (15) therapeutic touch therapists underwent an initial set of 10 trials each (Rosa et al., 1998). If HEF perception through therapeutic tough was possible, the therapists should have each been able to detect the experimenter's hand in 10 (100%) of 10 trials. Chance alone would produce a mean score of 5 (50%). The 15 touch therapists had mean score was 4.67 (standard deviation 1.74). Calculate a 95% confidence interval for the number of correct guesses based. Is the confidence interval compatible with random guessing? Is it compatible with, say, the ability to detect 3 of 4 HEFs? | ![]() |
Key to Odd Numbered Problems Key to Even Numbered Problems (may not be posted)