# 2: Frequency distributions 2/9/16

## Review questions

1. How many leaves do you plot on a stemplot?
2. How many stem values should you use when drawing a stemplot?
3. Why would you use split stem-values on a stemplot?
4. What is the depth of an observation?
5. What is the purpose of the stem-multiplier (e.g., � 100)?
6. Distributions with long tails toward the higher numbers are said to have a ______________ skew.
7. The three elements of a distribution are shape, location, and ___________.
8. This is the middle point of a distribution.
9. The median has a depth of ___________.
10. This is a value that does not fit in with the general pattern of a distribution.
11. Define the following terms: frequency, relative frequency, cumulative relative frequency.
12. What�s a distribution?
13. What�s the full name for �stemplot�?
14. What do you look for on a stemplot?
15. Why do the numbers of a stemplot have to align vertically and horizontally?
16. What do I do if each value has more significant digits that can fit on a Tukey stemplot?

## Exercises

2.1 Irish healthcare websites. The Irish Department of Health recommends a reading level at or below 12- to 14-years of age for health information aimed at the public. Reading levels for 46 Irish healthcare websites are shown below (O'Mahoney 1999).

(A) Create a stemplot for these data. (Hints: Use a stem-multiplier of �1. The tenths place for all the data points is .0.)

(B) Narratively describe the distribution�s shape, location, and spread. (Hint: Skews are described by the direction of their tail.)

(C) Create a frequency tale for the data. Include columns for frequency, relative frequency, and cumulative frequency, and cumulative relative frequency. (Hints: Include table cells with 0 counts. Frequency tables should always include a total row at the bottom. The total should sum to n. The relative frequency should sum to 100%. There is no total for relative frequencies and cumulative relative frequency.)

(D) What percentage of the website met the recommended level having a reading-level of 14 or below? (Hint: Use the cumulative relative frequency information in your frequency table to answer this question.)

 Data for exercise 2.1. Reading levels of 46 Irish healthcare websites. Data are available in SPSS format via this link: http://www.sjsu.edu/faculty/gerstman/datasets/irishweb.sav 08 10 11 11 12 13 13 13 13 14 14 15 15 15 15 15 15 15 16 16 16 16 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17

2.2 Poverty in eastern states. Poverty is associated with many health determinants and outcomes. The table below shows the percentage of people living below the poverty line in the 26 states east of the Mississippi for the year 2000 (Delaker, 2001).  Make a stemplot of these data and then describe the distribution narratively.

 Data for exercise 2.2. Percentage of people living below poverty level in 26 states east of the Mississippi, 2000. Data in SPSS format may be downloaded from this link: pov-east-2000.sav Alabama 14.6 Indiana 08.2 Mississippi 15.5 Pennsylvania 09.9 West Virginia 15.8 Connecticut 07.6 Kentucky 12.5 New Hampshire. 07.4 Rhode Is. 10.0 Wisconsin 08.8 Delaware 09.8 Maine 09.8 New Jersey 08.1 S. Carolina 11.9 Florida 12.1 Maryland 07.3 New York 14.7 Tennessee 13.3 Georgia 12.6 Massachusetts 10.2 N. Carolina 13.2 Vermont 10.1 Illinois 10.5 Michigan 10.2 Ohio 11.1 Virginia 08.1

2.3 Hospital duration data. A study by Townsend and co-worker (1979) looked at duration of hospital stays (in days). A sample of 25 patients from this study showed the following hospital stay durations (days):

 5 10 6 11 5 14 30 11 17 3 9 3 8 8 5 5 7 4 3 7 9 11 11 9 4

(A) Create a stemplot for these data using a stem multiplier of 10.
(B) Create a stemplot with split stem-values.
(C) Which of the above stemplots does a better job demonstrating the shape of the distribution?
(D) Describe the distribution's shape, location, and spread.
(E) Construct a frequency table of the data using 5-day class intervals (0 – 4 days, 5 – 9 days, etc.).

(F) What percentage of hospital stays were less than or equal to 5 days?

(G) What percentage were less than or equal to 15 days?

(H) What percentage were at least 15 days in length? (At least means �that much or more.�)

2.4 Outpatient wait time. The length of time patients wait for attention in doctors' offices is an important consideration in health care. Waiting times (in minutes) for 25 patients at a public health clinic are:

 35 22 63 6 49 19 16 31 24 29 23 32 72 13 51 45 77 16 33 55 10 42 28 72 13

(A) Draw a stemplot of the data. Describe the distribution's shape, location, and spread.
(B) From your stemplot, create a frequency table showing frequency counts, relative frequencies, and cumulative percents
(C) What percentage of wait times were less than 20 minutes?
(D) What percentage were at least 20 minutes?

2.5 Body weight expressed as a percentage of ideal (%IDEAL). This data set contains data made up of a variable that is equal to (actual body weight) � (ideal body weight) �100. A value of 100 represents 100% of ideal body weight, 120 represents 20% above ideal body weight, and so on. Data for n = 18 subjects are shown below. (Source: Saudek et al., 1989;  Pagano & Gauvreau, 1993, p. 208; data stored online in %ideal.*).

 107 119 99 114 120 104 88 114 124 116 101 121 152 100 125 114 95 117

(A) Construct and stem-and-leaf plot of these data. Describe what you see.
(B) Construct a frequency table for these data using 20-unit class intervals. Report frequencies, relative frequencies, and cumulative relative frequencies.

2.6 Seizures following bacterial meningitis (Source unknown).  The time between exposure to a causative agent and first symptoms is called the incubation or incubation period. A study examined induction periods in 13 seizure cases following bacterial meningitis. Data (months) are listed below. Using a stem-value multiplier of � 10, construct a stemplot for these data. Discuss your findings. [Hint: The value of 0.10 has a tens-place of 0 and ones-place of 0, so shows up as 0|0 on the plot with the �10 stem-multiplier.]

0.10    0.25    0.50    4    12    12    24    24    31    36    42    55    96

2.7 Children of physicians (DOCKIDS). The number of offspring is 24 physicians at a particular hospital are:

3    2    0    1    4    7    3    2    4    1    0    2    5    6    2    1    2    1    0    0   3    6    2    1

(A) Plot these data as a stemplot. Describe what you see (shape, location, and spread).
(B) Construct a frequency table for these data. What percentage of physicians at this hospital have 3 or fewer children? What percentage have at least 2 children?

2.8 Surgical times for artificial hearts (ART-HEART). Durations of surgeries (in hours) for 15 patients receiving artificial hearts are shown below (Kitchens, 1998, p. 139; data are stored in ../datasets/art-heart.sav.). Create a stem-and-leaf plot of these data. Are there any outliers?

Data (n = 15):

7.0    6.5    3.5    3.8    3.1    2.8     2.5    2.6    2.4    2.1     1.8    2.3    3.1    3.0    2.5

2.9 Grad student ages. Ages of 36 graduate students are listed below and are stored in ../datasets/grad-students.sav Explore these data with a stemplot and then describe the distribution. Provide two different explanations for the low outlier.

Data for GRAD-STUDENTS exercise (n = 36):

29    25    29    27    27    29    32    30    28    30    28    26    25    25    27    29    25    24    28    28
24    28    25    28    29    26    27    30    28    16    29    30    33    29    32    31

2.10 Income, Poverty, and Health Insurance. The U.S. Census Bureau reported on income, poverty, and health insurance coverage in the United States for the period 2002 to 2004 (DeNavas-Walt et al., 2005). Data for the average percentage of people without health insurance coverage by state for the 3-year period 2002 through 2004 are shown below and are stored in ../datasets/inc-pov-hlthins.sav as the variable NOINS ("no insurance").

(A) Create a stemplot of these data using an axis multiplier of �10.
(B) Redo the stemplot with double-split stem-values.
(C) Redo the stemplot with quintuple-split stem-values.
(D) Which plot is the most revealing? Interpret the most informative results.

 STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance STATE % w/out insurance Alabama 13.5 Hawaii 09.9 Michigan 11.4 North Carolina 16.6 Utah 13.4 Alaska 18.2 Idaho 17.3 Minnesota 08.5 North Dakota 11.0 Vermont 10.5 Arizona 17.0 Illinois 14.2 Mississippi 17.2 Ohio 11.8 Virginia 13.6 Arkansas 16.7 Indiana 13.7 Missouri 11.7 Oklahoma 19.2 Washington 14.2 California 18.4 Iowa 10.1 Montana 17.9 Oregon 16.1 West Virginia 15.9 Colorado 16.8 Kansas 10.8 Nebraska 11.0 Pennsylvania 11.5 Wisconsin 10.4 Connecticut 10.9 Kentucky 13.9 Nevada 19.1 Rhode Island 10.5 Wyoming 15.9 Delaware 11.8 Louisiana 18.8 New Hampshire 10.6 South Carolina 13.8 Dist. of Columbia 13.5 Maine 10.6 New Jersey 14.4 South Dakota 11.9 Florida 18.5 Maryland 14.0 New Mexico 21.4 Tennessee 12.7 Georgia 16.6 Massachusetts 10.8 New York 15.0 Texas 25.1

2.11 UNICEF low birth weight data (UNICEF). A weight at birth of less than 2,500 grams (about 5.5 pounds) qualifies as "low birth weight" according to standard conventions. Low birth-weight rates (per 100 births) for the year 1991 for 109 countries are stored in unicef.sav (UNICEF & Grant, J. P, 1991; Pagano and Gauvreau, 1993, p. 55). There are 129 records in the data file, but 20 values are missing; there are n = 109 valid data points.

(A) Download  unicef.sav and create a stemplot of the low birth weight rates.
(B) Determine the low birth weight rate for the US. (Sort the data by country and find the value for the USA. The menu commands are Data > Sort cases > Sort by.)
(C) Create a frequency table for low birth weights.  Where does the USA stand in this table?

2.12 Growth in the U.S. Hispanic population (PER-HISP). The 2000 census documented rapid growth of the Hispanic population in the United States. The percent of residents in the 50 United States who identified themselves in the 2000 census as Spanish, Hispanic, or Latino is shown below and is stored in per-hisp.sav  (2000 US Census).

(A) Create a stemplot of these data using single stem-values. [Use an axis multiplier of �10.]
(B) Create a stemplot using double-split stem-values.
(C) Which of the plots do you prefer?

 STATE PERCENT STATE PERCENT STATE PERCENT Alabama 1.5 Louisiana 2.4 Ohio 1.9 Alaska 4.1 Maine .7 Oklahoma 5.2 Arizona 25.3 Maryland 4.3 Oregon 8.0 Arkansas 2.8 Massachusetts 6.8 Pennsylvania 3.2 California 32.4 Michigan 3.3 Rhode Island 8.7 Colorado 17.1 Minnesota 2.9 South Carolina 2.4 Connecticut 9.4 Mississippi 1.3 South Dakota 1.4 Delaware 4.8 Missouri 2.1 Tennessee 2.0 Florida 16.8 Montana 2.0 Texas 32.0 Georgia 5.3 Nebraska 5.5 Utah 9.0 Hawaii 7.2 Nevada 19.7 Vermont 0.9 Idaho 7.9 New Hampshire 1.7 Virginia 4.7 Illinois 10.7 New Jersey 13.3 Washington 7.2 Indiana 3.5 New Mexico 42.1 West Virginia 0.7 Iowa 2.8 New York 15.1 Wisconsin 3.6 Kansas 7.0 North Carolina 4.7 Wyoming 6.4 Kentucky 1.5 North Dakota 1.2

2.13 East Boston Respiratory Disease Survey (FEV). Download fev.sav and open the file in SPSS. Create a stemplot of the AGE variable (Analyze > Descriptive Statistics > Explore). Go to the output window and navigate to the stem-and-leaf plot. Notice that each leaf in the plot represents 2 cases. Also notice frequencies are reported to the left of the plot. Describe the shape, location, and spread of the age distribution.

2.14 Student weights (BODY-WEIGHT). The data set body-weight.sav contains weights of 53 students (in pounds). Download the dataset and plot the results as a stemplot. Describe the distribution. (Data from presentation slides prepared by J. Mays for Moore's text, 2004, Chapter 2).

Keys may or may not be linked