3: Summary Statistics  Version: 9/12/06

Review Questions

  1. List the three measures of central location presented in this chapter. 
  2. The two main measures of spread are the standard deviation and the IQR. Define these measures.
  3. A study shows a mean of 0.98 and median of 0.56. What does this suggest about the shape of the distribution?
  4. Why are we unable to use the sum of the un-squared deviations for measuring spread?
  5. The line inside a boxplot represents the distribution's ________________.
  6. The top of the whisker on a box-and-whiskers plot goes up to the _______ _______ _______.
  7. What visual clues of spread are apparent on a boxplot? 
  8. Besides being the gravitational center of the distribution, the mean tells you several things you might want to know. Name three.
  9. This is the reason we don't often use the variance for descriptive purposes.  
  10. The hinge-spread is the same as the _____________.
  11. What five points comprise a 5-point summary?
  12. Q1 is the median of the ___________ half of the data set.
  13. Do the upper and lower fences appear on a boxplot? What are the upper and lower fences on a boxplot used for? 
  14. The denominator of the formula for the population variance is _________, while the denominator for the sample variance is ________. 
  15. Distinguish between µ and .
  16. Distinguish between s and s.
  17. Click here and then compare the two groups shown in the side-by-side boxplot. 
  18. What kinds of distributions are best summarized by the statistics and s?

Exercises

3.1 Serum polyphenols and red wine consumption. Drinking moderate amounts of red wine may reduce the risk of heart disease. The proposed mechanism is that the polyphenols in red wine improve serum cholesterol levels. An experiment is conducted in which 9 healthy men drink half a bottle of red wine each day for two weeks. They have their serum polyphenol levels measured at the beginning and end of the experiment. Percent change in polyphenols levels are shown below (Nigdikar, 1998; Moore, 2004, pp. 416, 643; data stored online in POLYPHEN.SAV). 

3.5 8.1 7.4 4.0 0.7 4.9 8.4 7.0 5.5

 

(A) Plot data as a stemplot. Discuss your findings. [Comment: The best / most informative plot uses an axis-multiplier of ×10 and quintuple-split stem-values. This will create 5 "bins" for the leaves.] 
(B) Calculate the distribution's mean and standard deviation. Relate the  numerical summaries to the stemplot.

 

3.2  Leaves on common stems (LEAVES.SAV). Calculate the means and standard deviations for each of the groups shown in Comparison A, B, and C below. Discuss how summary statistics relate to what you see on the plots.  

 

Comparison A

Group 1| |Group 2
------------------      
      0|1|
       |2|
      0|3|0
       |4|0
      0|5|0
       |6|0
      0|7|0
       |8|
      0|9|
       ×10

Comparison B

Group 1| |Group 2
------------------      
       |1|
       |2|
       |3|0
       |4|0
      0|5|0
      0|6|0
      0|7|0
      0|8|
      0|9|
       ×10

Comparison C

Group 1| |Group 2
------------------      
      0|1|
       |2|
      0|3|
       |4|
      0|5|0
       |6|0
      0|7|0
       |8|0
      0|9|0
       ×10  

 

 

3.3 Gravitational center. This exercise is intended to show the mean as the balancing point of a group of  numbers. In each instance, calculate the mean of the values and then show the location of the mean on the number line. 

(A) The values 1 and 5 are marked as "Xs" on the number line below. Calculate the mean and mark its location on the number line. 

X                   X
1----2----3----4----5

(B) The values 1, 5, and 5 are shown on the number line After calculating the mean of these points, mark its location on the number line. Notice how the extra 5 pulls the mean to the right. 

                    X
X                   X
1----2----3----4----5

(C) Calculate the mean of these three points. Their values are 2.75, 3.00, and 3.25. 

         XXX
1----2----3----4----5

(D) Calculate the mean of these three points. 

          X 
          X         X
1----2----3----4----5

3.4 Eye-balling the mean. Consider these eight data points: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42.  A stemplot of these points is:

1|4
2|03
3|4779
4|4
× 1

(A) Visually estimate ("eye-ball") the balancing point of this distribution. This is the approximate location of the mean. 
(B) Calculate the mean. 
(C) How well did you do in eye-balling the mean?

3.5 Spread #1. Each of the batches of numbers below has a mean of 100. Which has the most variability? (Arithmetic not required.)

Batch A:    0   50    100   150     200
Batch B:   50   75    100   125     150
Batch C:   75   87.5  100   112.5   125

3.6 Spread #2. Which of the following age distributions has the greatest variability? . . .  or are they equal? 

Group A: 0 years, 1 year, 2 years
Group B: 0 months, 12 months, 24 months
Group C: 0 days, 365 days, 730 days

3.7 Heights of 11-year old boys (68-95-99.7 rule)Eleven-year-old boys have a mean height of 146 centimeters with standard deviation 8 centimeters. The distribution is approximately Normal. Identify the range of heights that includes the middle 68% of boys. What range covers the middle 95% of values? ...99.7%? 

3.8 Scores~N(100, 10). A score is scaled so that it is Normal with a mean of 100 and standard deviation of 10. Use the 68-95-99.7 rule to determine expected ranges of values. 

3.9 Forced expiratory volume comparisons. Forced expiratory volume is a measure of respiratory function that is determined by having a person blow forcibly through a tube. The rate of air flow is then measured in a standard way. FEV levels for two independent groups are shown below. Calculate the means and standard deviations of the groups.  How do the groups compare? (Data are fictitiouis but realistic; data stored online in FEV-FICT.SAV). [Note: Students often have questions about rounding. A guide is to look toward reporting one decimal  beyond the precision of the data. Data for this problem have two decimal places. The mean and standard deviation should be reported to three decimal places. To derive three-decimal place accuracy, carry at least five decimals during calculations.]

Group 1 (n = 7):    3.94    1.47    2.06   2.36    3.74    3.43    3.78
Group 2 (n = 6):    1.22    3.63   1.95    2.01    2.43    3.02

3.10 Particulate matter in air samples (airsamples.sav). Suspended particulate matter in air is an indicator of air pollution. Particulate air pollution levels (micrograms per cubic meter) from two sites are shown below as a stemplot. Calculate means and standard deviations for each site and explain how these statistics relate to the side-by-side stemplot.

Site 1| |Site 2
----------------
    42|2|
     8|2|
     2|3|234
    86|3|6689   
     2|4|0
      |4|
      |5|
      |5|   
      |6|
     8|6|   
     (×10)

3.11 The median is more robust than the mean (ROBUST.SAV). Body weights (n = 10) expressed as "percentage of ideal" for 10 individuals are {99, 101, 107, 114, 116, 119, 121, 125, 152, 155}. 

(A) Calculate the mean and median of these observations. 
(B) Draw a stemplot of the data and identify the two outliers in the dataset.
(C) Exclude the two outliers and recalculate the mean and median. What impact did removing the outliers have on each of these statistics?  

3.12 Seizures following bacterial meningitis (SEIZURE.SAV).  Data on the number of months between the onset of  bacterial meningitis and seizures in 13 children was introduced in the prior chapter. Values were {0.10, 0.25, 0.50, 4, 12, 12, 24, 24, 31, 36, 42, 55, 96). 

(A) Calculate the mean and standard deviation for these data. 
(B) Determine the median and IQR. 
(C) Compare the distribution's mean and median. What does this tell you about the shape of the distribution? Which measures of central location and spread to you prefer. 
(D) Determine the 5 point summary for these data and draw the boxplot.

3.13 Body weight expressed as a percentage of ideal. Data from a study by Saudek and co-workers (1999) measured body weight as a percentage of ideal. The variable %IDEAL = (actual body weight) ÷ (ideal body weight) ×100.  Data are shown as a stemplot below and are stored online in %IDEAL.SAV

(A) Provide the 5-point summary for these data. 
(B) Draw a boxplot of these data. Show all work. Use graph paper when drawing your plot. 

08|8
09|59
10|0147
11|444679
12|0145
13|
14|
15|2
×10

3.14 Skin fold thickness and chronic lung disease (SKINFOLD). Skin-fold thickness is an anthropometric method for assessing fat deposition and its variation with health and disease. It is measured by an observer with specialized skin-fold calipers over the triceps muscle in the arm. Measurements  (millimeters) at made at the midpoint of the tricepts in 5 men with chronic lung disease and 6 comparably-aged controls. Calculate the mean and standard deviation of the data in these two groups and discuss your findings. (Data are fictitious but realistic).

Chronic lung disease:    9.1    10.9   11.4    15.3    18.4    
Normal controls:        10.4    19.6   20.6    23.8    24.7    32.8    

3.15  Treatment for tachycardia (HEARTRATE). An individual with an irregular heart rate is given a medication that is intended to stabilize his heart rate. Heart rates (beats per minute) before and after treatment are shown below. Determine the mean and standard deviation before and after starting the medication. Did the drug work?

Before:             65        85        90        65        55        60
After:               68        70        69        70        71        72

3.16  Side-by-side boxplot. Plot data below as side-by-side boxplots. After completing your plot, discuss your findings.

Data Set A:     12    10    20    15    15
Data Set B:     10    35    45    55    60    70    

Key to Odd Numbered Problems                                   Key to Even Numbered Problems (May not be posted)