2: Frequency distributions 2/8/07

Review questions

  1. How many leaves are on a stemplot? 
  2. Why would you use split stem-values on a stemplot?
  3. What is the depth of an observation?
  4. What is the purpose of using an stem-value multiplier (e.g., × 100)?
  5. Distribution with a long tails toward the higher numbers are said to have a ______________ skew.
  6. The three elements of a distribution are its shape, location, and ___________.
  7. This is the middle point of a distribution.
  8. This is the gravitational center of a distribution.
  9. The median has a depth of ___________.
  10. This is a value that does not fit in with the general pattern of a distribution.
  11. Define the following terms: frequency, relative frequency, cumulative [relative] frequency.
  12. Why are “end-point conventions” needed when constructing frequency tables for data that are in class intervals?
  13. True or false? Histograms should not be used when plotting frequencies for categorical variables.

Exercises

2.1 Irish healthcare websites. The Irish Department of Health recommends a reading level at or below 12- to 14-years of age for health information aimed at the public. The reading levels for 46 Irish healthcare websites are shown below (O'Mahoney, 1999) and are stored online in irishweb.sav.

08 10 11 11 12 13 13 13 13 14
14 15 15 15 15 15 15 15 16 16
16 16 17 17 17 17 17 17 17 17
17 17 17 17 17 17 17 17 17 17
17 17 17 17 17 17        
(A) Create a stemplot for these data. Use a stem-multiplier of ×1. Discuss the distribution's shape, location, and spread. 
(B) What percentage of the website met the recommended level having a reading-level of 14 or below? 

2.2 Poverty in eastern states (POV-EAST-2000). Poverty is associated with many health determinants. The table below shows the percentage of people living below the poverty line in the 26 states east of the Mississippi for the year 2000 (Delaker, 2001). [Data are also stored online in the file pov-east-2000.sav.]

Alabama

14.6

Indiana

08.2

Mississippi

15.5

Pennsylvania

09.9

West Virginia

15.8

Connecticut

07.6

Kentucky

12.5

New Hampshire.

07.4

Rhode Is.

10.0

Wisconsin

08.8

Delaware

09.8

Maine

09.8

New Jersey

08.1

S. Carolina

11.9

 

 

Florida

12.1

Maryland

07.3

New York

14.7

Tennessee

13.3

 

 

Georgia

12.6

Massachusetts

10.2

N. Carolina

13.2

Vermont

10.1

 

 

Illinois

10.5

Michigan

10.2

Ohio

11.1

Virginia

08.1

 

 

(A) Make a stemplot of these data using a a stem-multiplier of 1. Describe the distribution’s shape. Are there are outliers? If so, identify the outlying state(s).
(B) What is the median value?
Which states straddle the median?

2.3 Hospital duration data. A study by Townsend and co-worker (1979) looked at duration of hospital stays. A sample of 25 patients from this study showed the following results:

5 10 6 11 5 14 30 11 17 3
9 3 8 8 5 5 7 4 3 7
9 11 11 9 4

 

(A) Create a stemplot for these data using a stem multiplier of ×10. 
(B) Create a stemplot with split stem-values. 
(C) Which of the above stemplots do you prefer? 
(D) Describe the distribution's shape, spread, and location. 
(E) Construct a frequency table of the data with 5-day class intervals. What percentage of hospital stays were less than 5 days? What percentage were less than 15? What percentage of hospital stays were at least 15 days in length? 

2.4 Outpatient wait time. The length of time patients wait for attention in doctors' offices is an important consideration in health care. Waiting times (in minutes) for 25 patients at a public health clinic are:

35 22 63 6 49 19 16 31 24 29
23 32 72 13 51 45 77 16 33 55
10 42 28 72 13

(A) Draw a stemplot of the data. Describe the distribution's shape, location, and spread. 
(B) From your stemplot, create a frequency table showing frequency counts, relative frequencies, and cumulative percents. 
(C) What percentage of wait times were less than 20 minutes? 
(D) What percentage were at least 20 minutes?

2.5 Body weight expressed as a percentage of ideal (%IDEAL). This data set contains data made up of a variable that is equal to (actual body weight) ÷ (ideal body weight) ×100. A value of 100 represents 100% of ideal body weight, 120 represents 20% above ideal body weight, and so on. Data for n = 18 subjects are shown below. (Source: Saudek et al., 1989;  Pagano & Gauvreau, 1993, p. 208; data are stored online in the file ../datasets/%ideal.sav). 

107 119 99 114 120 104 88 114 124 116
101 121 152 100 125 114 95 117

(A) Construct and stem-and-leaf plot of these data. Describe what you see. 
(B) Construct a frequency table for these data using 20-unit class intervals. Report frequencies, relative frequencies, and cumulative relative frequencies. 

2.6 Seizures following bacterial meningitis (Source unknown).  The time between exposure to a causative agent and first symptoms is called the incubation or incubation period. A study examined induction periods in 13 seizure cases following bacterial meningitis. Data (months) are listed below. Using a stem-value multiplier of × 10, construct a stemplot for these data. Discuss your findings. [Hint: The value of 0.10 has a tens-place of 0 and ones-place of 0, so shows up as 0|0 on the plot with the ×10 stem-multiplier.]

    0.10    0.25    0.50    4    12    12    24    24    31    36    42    55    96 

2.7 Children of physicians (DOCKIDS). The number of offspring is 24 physicians at a particular hospital are:

    3    2    0    1    4    7    3    2    4    1    0    2    5    6    2    1    2    1    0    0   3    6    2    1

(A) Plot these data as a stemplot. Describe what you see (shape, location, and spread).
(B) Construct a frequency table for these data. What percentage of physicians at this hospital have 3 or fewer children? What percentage have at least 2 children?

2.8 Surgical times for artificial hearts (ART-HEART). Durations of surgeries (in hours) for 15 patients receiving artificial hearts are shown below (Kitchens, 1998, p. 139; data are stored in ../datasets/art-heart.sav.). Create a stem-and-leaf plot of these data. Are there any outliers? 

Data (n = 15): 

7.0    6.5    3.5    3.8    3.1    2.8     2.5    2.6    2.4    2.1     1.8    2.3    3.1    3.0    2.5

2.9 Grad student ages. Ages of 36 graduate students are listed below and are stored in ../datasets/grad-students.sav Explore these data with a stemplot and then describe the distribution. Provide two different explanations for the low outlier.

Data for GRAD-STUDENTS exercise (n = 36): 

29    25    29    27    27    29    32    30    28    30    28    26    25    25    27    29    25    24    28    28
24    28    25    28    29    26    27    30    28    16    29    30    33    29    32    31

2.10 Income, Poverty, and Health Insurance. The U.S. Census Bureau reported on income, poverty, and health insurance coverage in the United States for the period 2002 to 2004 (DeNavas-Walt et al., 2005). Data for the average percentage of people without health insurance coverage by state for the 3-year period 2002 through 2004 are shown below and are stored in ../datasets/inc-pov-hlthins.sav as the variable NOINS ("no insurance"). 

(A) Create a stemplot of these data using an axis multiplier of ×10.
(B) Redo the stemplot with double-split stem-values. 
(C) Redo the stemplot with quintuple-split stem-values.
(D) Which plot is the most revealing? Interpret the most informative results.

STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance
Alabama 13.5 Hawaii  09.9 Michigan  11.4 North Carolina  16.6 Utah  13.4
Alaska 18.2 Idaho  17.3 Minnesota  08.5 North Dakota  11.0 Vermont  10.5
Arizona  17.0 Illinois  14.2 Mississippi  17.2 Ohio  11.8 Virginia  13.6
Arkansas  16.7 Indiana  13.7 Missouri  11.7 Oklahoma  19.2 Washington  14.2
California  18.4 Iowa  10.1 Montana 17.9 Oregon  16.1 West Virginia  15.9
Colorado  16.8 Kansas  10.8 Nebraska  11.0 Pennsylvania  11.5 Wisconsin  10.4
Connecticut  10.9 Kentucky  13.9 Nevada  19.1 Rhode Island  10.5 Wyoming  15.9
Delaware  11.8 Louisiana  18.8 New Hampshire  10.6 South Carolina  13.8
Dist. of Columbia  13.5 Maine  10.6 New Jersey  14.4 South Dakota  11.9
Florida  18.5 Maryland  14.0 New Mexico  21.4 Tennessee  12.7
Georgia  16.6 Massachusetts  10.8 New York  15.0 Texas  25.1

2.11 UNICEF low birth weight data (UNICEF). A weight at birth of less than 2,500 grams (about 5.5 pounds) qualifies as "low birth weight" according to standard conventions. Low birth-weight rates (per 100 births) for the year 1991 for 109 countries are stored in unicef.sav (UNICEF & Grant, J. P, 1991; Pagano and Gauvreau, 1993, p. 55). There are 129 records in the data file, but 20 values are missing; there are n = 109 valid data points. 

(A) Download  unicef.sav and create a stemplot of the low birth weight rates.
(B) Determine the low birth weight rate for the US. (Sort the data by country and find the value for the USA. The menu commands are Data > Sort cases > Sort by.) 
(C) Create a frequency table for low birth weights.  Where does the USA stand in this table?

2.12 Growth in the U.S. Hispanic population (PER-HISP). The 2000 census documented rapid growth of the Hispanic population in the United States. The percent of residents in the 50 United States who identified themselves in the 2000 census as Spanish, Hispanic, or Latino is shown below and is stored in per-hisp.sav  (2000 US Census).

(A) Create a stemplot of these data using single stem-values. [Use an axis multiplier of ×10.]
(B) Create a stemplot using double-split stem-values. 
(C) Which of the plots do you prefer?

STATE PERCENT STATE PERCENT STATE PERCENT
Alabama 1.5 Louisiana 2.4 Ohio 1.9
Alaska 4.1 Maine .7 Oklahoma 5.2
Arizona 25.3 Maryland 4.3 Oregon 8.0
Arkansas 2.8 Massachusetts 6.8 Pennsylvania 3.2
California 32.4 Michigan 3.3 Rhode Island 8.7
Colorado 17.1 Minnesota 2.9 South Carolina 2.4
Connecticut 9.4 Mississippi 1.3 South Dakota 1.4
Delaware 4.8 Missouri 2.1 Tennessee 2.0
Florida 16.8 Montana 2.0 Texas 32.0
Georgia 5.3 Nebraska 5.5 Utah 9.0
Hawaii 7.2 Nevada 19.7 Vermont 0.9
Idaho 7.9 New Hampshire 1.7 Virginia 4.7
Illinois 10.7 New Jersey 13.3 Washington 7.2
Indiana 3.5 New Mexico 42.1 West Virginia 0.7
Iowa 2.8 New York 15.1 Wisconsin 3.6
Kansas 7.0 North Carolina 4.7 Wyoming 6.4
Kentucky 1.5 North Dakota 1.2

2.13 East Boston Respiratory Disease Survey (FEV). Download fev.sav and open the file in SPSS. Create a stemplot of the AGE variable (Analyze > Descriptive Statistics > Explore). Go to the output window and navigate to the stem-and-leaf plot. Notice that each leaf in the plot represents 2 cases. Also notice frequencies are reported to the left of the plot. Describe the shape, location, and spread of the age distribution.

2.14 Student weights (BODY-WEIGHT). The data set body-weight.sav contains weights of 53 students (in pounds). Download the dataset and plot the results as a stemplot. Describe the distribution. (Data from presentation slides prepared by  J. Mays for Moore's text, 2004, Chapter 2).

Key to Odd Numbered Exercises                                      Key to Even Numbered Exercises (may not be posted)