6: Introduction to Hypothesis Testing Version: 8/31/06

Review Questions 

  1. How do we accrue evidence to bolster the alternative hypothesis?
  2. List the steps of hypothesis testing used in this book.
  3. Which hypothesis does the researcher hope to reject? 
  4. If the P value is not the probability the null hypothesis is false, what then is it? 
  5. In a one-sample z test, where does the value of µ0 come from? 
  6. Why are P values from two-sided alternative hypotheses twice as big as P-values from one-sided tests?
  7. What is the denominator of the one-sample z statistic?
  8. Create a 2-by-2 table showing the potential consequences of a hypothesis test.
  9. What is beta?
  10. What is the complement of beta?
  11. The criminal justice analogy. A jury having heard the evidence in a criminal case is debating whether or not to render a verdict of guilty or not guilty. Because it is worse to convict an innocent person of a crime they did not commit than to let a guilty person go free, each case begins with an assumption of innocence. This is analogous to assumption or retaining H0 in a statistical hypothesis test. Fill in the table below. Which of these decisions is analogous to a type I error, and which is analogous to a type II error?

    TRUTH
    DECISION OF JURY Innocent Did crime
    Not Guilty    
    Guilty    

Part A: Basics and z procedures

6A.1. Misconceived hypotheses. What is wrong with each of these statements? 

(A) H0: µ = 100 vs. H1: µ 110
(B) H0: = 100 vs. H1: < 100

6A.2. Hypothesis statements. Set up null and alternative hypotheses for each of the following claims: 

(A) A marriage counselor claims that a new method of conflict resolution is less prone to client interruptions. Prior experience suggests clients had interrupted each other 10 times per session on average. 
(B) An international health researcher claims that a pediatric population in a developing nation is anemic. The population should have a hemoglobin level of 12 mg/dL. 
(C)   It is known that American adults tend to gain (on average) 1 pound per year as they age from ages 20 to 30. A public health campaign aims to decrease the amount of weight gained in this population. A long term follow-up study finds that the average weight gain in a sample of 100 individuals in this age range is 0.5 pounds. 

6A.3 Satisfaction survey. A public health clinic administers a survey that addresses patient satisfaction with services. The survey uses a summary score on a 70 point  scale, with 70 indicating the highest possible satisfaction. The survey instrument has been tested in other environments and has historically had mean µ = 50 and standard deviation s = 7.5. 

(A) Although the distribution of scores shows a negative skew, we can assume that the distribution of s derived from this population will be Normal if the sample size is large. What is the name of the theory that supports this assumption? 
(B) What is the standard deviation of the distribution of s when the SRS has n = 36? [Another term for "the standard deviation of the distribution of s" is "the standard error of the mean."]
(C) We seek evidence against the claim that this population has mean score µ that is 50 and take a SRS of n = 36. Sketch the curve that describes the sampling distribution of s for the sample of the mean. Mark the horizontal axis with tick marks showing the mean and standard deviation markers ±1 and ±2 standard deviations from μ
(D) Suppose that the sample of 36 patients gives a mean score of 48.4. Mark this point on the horizontal axis of your sketch and explain why this outcome does not provide good evidence against the null hypothesis.

6A.4 Satisfaction survey (cont.). This is a continuation of the prior exercise.

(A) Suppose the sample mean   in the prior exercise had been 46.5. Mark this point on the horizontal axis of your sketch of the sampling distribution of the mean under the null hypothesis and describe how a mean of 46.5 provides good evidence of a lower than expected mean score. 
(B) State the null and alternative hypotheses for prior two exercises. 
(C) Would you reject H0 if was 48.4? 
(D) Would you reject H0 if was 46.5? 

6A.5 Satisfaction survey (cont.). In the prior two exercises you considered potential results of a clinic patient satisfaction survey. You were testing H0: μ = 50. 

(A) Now let us do a one-sided test, i.e., H0: μ < 50. Calculate the z statistic for the problem and determine the P-value for an observed sample mean  of 48.4. 

(B) Now consider a different sample that reveals a sample mean of = 46.5. Calculate the zstat and P-value. Are results significant at the α = 0.05 level? Are they significant at alpha = 0.01?

6A.6. P from Z. Determine the two-sided P values for each of these z statistics (A) zstat = 1.72; (B) zstat = 0.83; (C) zstat = -2.45

6A.7 Lithium. Lithium carbonate is a drug used to treat bipolar mental disorders. The average dose in well-maintained patients is µ = 1.3 mEq/L (s = 0.3 mEq/L). A random sample of 25 patients on lithium demonstrates a mean lithium level of 1.4 mEq/L. 

(A) Conduct a one sample z test to see if the observed difference is significant. [Use a two-sided alternative, as improper dosing would include both under- and over-dosing.] 
(B) Are the results significant at alpha = 0.05? Are they significant at alpha = 0.10?

6A.8 Hemoglobin. Hemoglobin is the component of red blood cells that carries oxygen from the lungs to body tissues. Values of less than 12 grams of hemoglobin per deciliter of blood (g/dl) are indicative the anemia. An international health researcher suspects that the mean hemoglobin levels (µ) of children in a developing nation is less than 12. The researcher samples 35 children and finds a sample mean of 11.2 g/dl. We are willing to assume hemoglobin levels in this population are approximately Normal with µ = 12 and s = 1.6 g/dl. Test whether the population is anemic. Show all hypothesis testing steps, including statements of the null and alternative hypotheses. Use a two-sided alternative hypothesis. [Data: Moore, 2004, p.342.]

6A.9 NHES. The National Health and Examination Survey (NHES) of 1976 - 80 found a mean serum cholesterol level in U. S. men  of µ = 210 mg/dl with standard deviation (sigma) =  90 mg/dl. 

(A) Calculate the standard error of the mean for a sample of n = 36. 
(B) Find the probability that a sample mean of 240 or greater based on n = 36? As part of your response, sketch the sampling distribution curve for . Because the sample is moderately large, we are willing to assume that the sampling distribution of is Normal.

6A.10 Female executives. Published results suggest that hospital administrators with a masters degree have an average salary of µ = $85,100 (s = $10,000). We are willing to assume the sampling distribution of based on n = 20 will be approximately Normal. 

(A) Salaries for a random sample of 20 female hospital administrators with Masters degrees has a sample mean =  $80,900. Set up the null and alternative hypotheses to test the investigator's claim. Use a two-sided alternative. 
(B) Show the sampling distribution of assuming H0 is true. Draw the Normal curve for this sampling distribution, marking the horizontal axis with landmarks based on the expected mean and standard deviation (error). [You must calculate the SEM before placing landmarks on the curve.]
(C) Calculate the zstat for the problem and place it on the horizontal axis in its proper location. From this zstat determine the [two-sided] P-value and interpret this result. Is the evidence significant? 
(D) Optional: List 4 explanations for the findings. 

6A.11 Fathers had heart attacks. The mean fasting serum cholesterol of teenage boys in the United States is  µ = 175 mg /ml with s = 50. A SRS of 39 boys whose father's have a history of heart attack reveals a sample mean of = 195 mg/ml. Using a two-sided alternative hypothesis, determine whether the sample mean is significantly different than expected. [Show all hypothesis testing steps: hypotheses, test statistic, P value, conclusion.]

6A.12 Diet and bowel cancer. A study cast doubt on the belief that eating a low-fat, high fiber diet would reduce the risk of colon cancer (Schatzkin et al. NEJM, 2000;342:1149-1155). Individuals in the study had had one or more histologically confirmed colorectal adenomas removed within six months of enrollment into the study. Participants were then randomly assigned to a low-fat, high-fiber diet or a control group that ate their regular diets. Study participants were screened for polyps over the next four years. Surprisingly, of the 958 subjects in the intervention group 39.7 percent had recurrence. In the 947 in the control group who completed the study, 39.5 percent, had recurrence. The difference was not statistically significant. How would you explain what the phrase "the difference was not significantly difference" to someone who had never taken a statistics class. [Similar to Exercise 14.40 in Moore, 2004.]

6A.13 LDL and fiber. A cross-over trial of 13 diabetics compared cholesterol levels while on a moderate-fiber diet and while on a high-fiber diet (Chandalia et al., NEJM, 2000, 342: 1392 - 1398). The study concluded that the high-fiber diet reduced very-low-density lipoprotein cholesterol by 12.5% (P = 0.01). A critic notes that only 13 patients were studied so the observed drop could have occurred merely by chance. Explain how the statistic P = 0.01 addresses this objection. 

6A.14 University men. A SRS of 18 male students at a university has a sample mean height of 70-inches. This is a little more than the average height of adult men in the town in which the university is located. The average height in the town is 69-inches. Male height is approximately Normal distributed with σ = 2.8. Conduct a significance test to determine whether the sample mean is significantly different from the than other men in the town. Show all significance testing steps. Conduct a two-sided test.

Part B: Testing a mean when the population standard deviation is not known (t tests)

6B.1 P-value from tstat. A test of H0: µ = 0  based on n = 16 calculate tstat = 2.44.

(A) How many degrees of freedom are associated with the test statistic?
(B) Provide the t quantiles from the t table that bracket the tstat
(C) What are the right-tail probabilities for the bracketing t quantiles?
(D) What is the one-sided P-value for this problem? 
(E) What is the two-sided P-value? 

 

6B.2 Critical value. You take a SRS of n = 21 from a Normal population to test H0: µ = 0 versus H1: µ > 0. What values of the tstat will give a P-value that is less than or equal to 0.01? 

6B.3  Beware a  = 0.05. Two trials were done under identical conditions. In each trial, men consumed eight ounces of red wine per day. 

(A) In trial A, 25 subjects lowered their cholesterol by an average of 5 percentage points (standard deviation 11.9). In testing H0: μ = 0, tstat = 2.10, df = 24, and P = 0.0464. Is this study significant at alpha = 0.05?
(B) In trial B, 25 different subjects lowered their cholesterol also by 5 percentage (standard deviation 12.2 percent). The tstat for this problem is 2.05, df = 24, P = 0.0514. Is this result significant at alpha = 0.05?
(C) Is it reasonable to come to a different conclusion for trial A and trial B? Explain your reasoning.

6B.4 A benefit of red wine. Nine men who drank a liter of red wine per day increased their serum polyphenol levels by  = 5.5 percent with s = 2.517 percent (Moore, 2004, p. 416; Nigdikar et al., 1998). Test whether this increase was significant. Conduct a two-sided alternative. Show all testing steps. [We are looking for evidence against "no mean change," so H0: µ = 0.]

6B.5 Menstrual cycle length Menstrual cycles length (days) in a random sample of 9 women are {31, 28, 26, 24, 29, 33, 25, 26, 28}. Assume the population is Normal. Test whether the mean length of the menstrual cycle in this population is a lunar month. (A lunar month is 29.5 days.) Show all hypothesis testing steps, including statements of the null and alternative hypothesis.

 

6B.6 Behavioral problems in stressed adolescents. Because there is evidence that stress in a child's life may lead to behavioral problems later in life, it might be expected that a sample of children who have been subjected to an unusual amount of stress would show an unusually high level of behavioral problems. On the other hand, children with high stress levels could feel that they have had enough going on in their lives without complicating matters further, so they might actually show an unusually low number of behavioral problems. To investigate this question, we selected at a SRS of 5 children, each of whom is under a high level of social stress. We examine these children and assign a score of 50 represents an average amount of behavioral problems. Data are {48, 62, 53, 51, 58 }. Test whether these scores are significantly different from the established norm of 50. Show all hypothesis testing steps.

 

6B.7 Cholesterol in Asian immigrants. We want to compare fasting serum cholesterol levels of recent Asian immigrants to that of the overall U. S.  population. Assume cholesterol levels in 20- to 39-years old women in the United States is Normal with µ  = 190 mg/dl. Blood tests are preformed on 100 female Asian immigrants in this age range. The mean cholesterol level in this sample is 181.52 mg/dl (standard deviation = 40 mg/dl) (Rosner, 2000, p. 223). Conduct a two-sided test to determine whether the recent immigrants have lower average cholesterol levels than their native counterparts. Show all hypothesis testing steps. 

 

6B.8 Lowering elevated heart rate. A new calcium channel blocking agent is tested in 9 patients with unstable angina. Resting heart rate after 48 hours of treatment decreased an average of 5 beats per minute (standard deviation = 2.5). Is this drop significant. (Use a two-sided alternative.)

6B.9 SIDS. A sample of birth weights (grams) of 10 infants who had died of Sudden Infant Death Syndrome (SIDS) in a large metropolitan area was {2998, 3740, 2031, 2804, 2454, 2780, 2203, 3803, 3948, 2144}. The mean weight of all births in this metropolitan was 3300 grams. Is the mean birth weight of SIDS cases significantly different from that of the rest of the population? (Perform a  a two-sided test.)

6B.10 Everley's syndrome. High calcium concentration may provide a useful diagnostic and therapeutic clues in the treatment of a rare congenital disorder known as Everley's syndrome. Plasma calcium concentration in the 18 Everley's syndrome patients between the age of 20- and 44-years have a mean  serum calcium concentration of 3.2 mmol/l with standard deviation 1.1.The normal population have mean plasma calcium concentrations that averages 2.5 mmol/l (Swinscow, 2006) . Is the mean calcium level in the Everley's syndrome patients abnormally high? 

 

Key to Odd Numbered Problems                                  Key to Even Numbered Problems (may not be posted)