2: Frequency distributions 2/9/16

Review questions

How many leaves do you plot on a stemplot?
How many stem values should you use when drawing a stemplot?
Why would you use split stem-values on a stemplot?
What is the depth of an observation?
What is the purpose of the stem-multiplier (e.g., � 100)?
Distributions with long tails toward the higher numbers are said to have a ______________ skew.
The three elements of a distribution are shape, location, and ___________.
This is the middle point of a distribution.
The median has a depth of ___________.
This is a value that does not fit in with the general pattern of a distribution.
Define the following terms: frequency, relative frequency, cumulative relative frequency.
What�s a distribution?
What�s the full name for �stemplot�?
What do you look for on a stemplot?
Why do the numbers of a stemplot have to align vertically and horizontally?
What do I do if each value has more significant digits that can fit on a Tukey stemplot?

Exercises

2.1 Irish healthcare websites. The Irish Department of Health recommends a reading level at or below 12- to 14-years of age for health information aimed at the public. Reading levels for 46 Irish healthcare websites are shown below (O'Mahoney 1999).

(A) Create a stemplot for these data. (Hints: Use a stem-multiplier of �1. The tenths place for all the data points is .0.)

(B) Narratively describe the distribution�s shape, location, and spread. (Hint: Skews are described by the direction of their tail.)

(C) Create a frequency tale for the data. Include columns for frequency, relative frequency, and cumulative frequency, and cumulative relative frequency. (Hints: Include table cells with 0 counts. Frequency tables should always include a total row at the bottom. The total should sum to n. The relative frequency should sum to 100%. There is no total for relative frequencies and cumulative relative frequency.)

(D) What percentage of the website met the recommended level having a reading-level of 14 or below? (Hint: Use the cumulative relative frequency information in your frequency table to answer this question.)

Data for exercise 2.1. Reading levels of 46 Irish healthcare websites. Data are available in SPSS format via this link: http://www.sjsu.edu/faculty/gerstman/datasets/irishweb.sav
08	10	11	11	12	13	13	13	13	14
14	15	15	15	15	15	15	15	16	16
16	16	17	17	17	17	17	17	17	17
17	17	17	17	17	17	17	17	17	17
17	17	17	17	17	17

2.2 Poverty in eastern states. Poverty is associated with many health determinants and outcomes. The table below shows the percentage of people living below the poverty line in the 26 states east of the Mississippi for the year 2000 (Delaker, 2001). Make a stemplot of these data and then describe the distribution narratively.

Data for exercise 2.2. Percentage of people living below poverty level in 26 states east of the Mississippi, 2000. Data in SPSS format may be downloaded from this link: pov-east-2000.sav
Alabama	14.6	Indiana	08.2	Mississippi	15.5	Pennsylvania	09.9	West Virginia	15.8
Connecticut	07.6	Kentucky	12.5	New Hampshire.	07.4	Rhode Is.	10.0	Wisconsin	08.8
Delaware	09.8	Maine	09.8	New Jersey	08.1	S. Carolina	11.9
Florida	12.1	Maryland	07.3	New York	14.7	Tennessee	13.3
Georgia	12.6	Massachusetts	10.2	N. Carolina	13.2	Vermont	10.1
Illinois	10.5	Michigan	10.2	Ohio	11.1	Virginia	08.1

2.3 Hospital duration data. A study by Townsend and co-worker (1979) looked at duration of hospital stays (in days). A sample of 25 patients from this study showed the following hospital stay durations (days):

5	10	6	11	5	14	30	11	17	3
9	3	8	8	5	5	7	4	3	7
9	11	11	9	4

(A) Create a stemplot for these data using a stem multiplier of �10.
(B) Create a stemplot with split stem-values.
(C) Which of the above stemplots does a better job demonstrating the shape of the distribution?
(D) Describe the distribution's shape, location, and spread.
(E) Construct a frequency table of the data using 5-day class intervals (0 – 4 days, 5 – 9 days, etc.).

(F) What percentage of hospital stays were less than or equal to 5 days?

(G) What percentage were less than or equal to 15 days?

(H) What percentage were at least 15 days in length? (At least means �that much or more.�)

2.4 Outpatient wait time. The length of time patients wait for attention in doctors' offices is an important consideration in health care. Waiting times (in minutes) for 25 patients at a public health clinic are:

35

22

63

6

49

19

16

31

24

29

23

32

72

13

51

45

77

16

33

55

10

42

28

72

13

(A) Draw a stemplot of the data. Describe the distribution's shape, location, and spread.
(B) From your stemplot, create a frequency table showing frequency counts, relative frequencies, and cumulative percents.
(C) What percentage of wait times were less than 20 minutes?
(D) What percentage were at least 20 minutes?

2.5 Body weight expressed as a percentage of ideal (%IDEAL). This data set contains data made up of a variable that is equal to (actual body weight) � (ideal body weight) �100. A value of 100 represents 100% of ideal body weight, 120 represents 20% above ideal body weight, and so on. Data for n = 18 subjects are shown below. (Source: Saudek et al., 1989; Pagano & Gauvreau, 1993, p. 208; data stored online in %ideal.*).

107	119	99	114	120	104	88	114	124	116
101	121	152	100	125	114	95	117

(A) Construct and stem-and-leaf plot of these data. Describe what you see.
(B) Construct a frequency table for these data using 20-unit class intervals. Report frequencies, relative frequencies, and cumulative relative frequencies.

2.6 Seizures following bacterial meningitis (Source unknown). The time between exposure to a causative agent and first symptoms is called the incubation or incubation period. A study examined induction periods in 13 seizure cases following bacterial meningitis. Data (months) are listed below. Using a stem-value multiplier of � 10, construct a stemplot for these data. Discuss your findings. [Hint: The value of 0.10 has a tens-place of 0 and ones-place of 0, so shows up as 0|0 on the plot with the �10 stem-multiplier.]

0.10 0.25 0.50 4 12 12 24 24 31 36 42 55 96

2.7 Children of physicians (DOCKIDS). The number of offspring is 24 physicians at a particular hospital are:

3 2 0 1 4 7 3 2 4 1 0 2 5 6 2 1 2 1 0 0 3 6 2 1

(A) Plot these data as a stemplot. Describe what you see (shape, location, and spread).
(B) Construct a frequency table for these data. What percentage of physicians at this hospital have 3 or fewer children? What percentage have at least 2 children?

2.8 Surgical times for artificial hearts (ART-HEART). Durations of surgeries (in hours) for 15 patients receiving artificial hearts are shown below (Kitchens, 1998, p. 139; data are stored in ../datasets/art-heart.sav.). Create a stem-and-leaf plot of these data. Are there any outliers?

Data (n = 15):

7.0 6.5 3.5 3.8 3.1 2.8 2.5 2.6 2.4 2.1 1.8 2.3 3.1 3.0 2.5

2.9 Grad student ages. Ages of 36 graduate students are listed below and are stored in ../datasets/grad-students.sav Explore these data with a stemplot and then describe the distribution. Provide two different explanations for the low outlier.

Data for GRAD-STUDENTS exercise (n = 36):

29 25 29 27 27 29 32 30 28 30 28 26 25 25 27 29 25 24 28 28
24 28 25 28 29 26 27 30 28 16 29 30 33 29 32 31

2.10 Income, Poverty, and Health Insurance. The U.S. Census Bureau reported on income, poverty, and health insurance coverage in the United States for the period 2002 to 2004 (DeNavas-Walt et al., 2005). Data for the average percentage of people without health insurance coverage by state for the 3-year period 2002 through 2004 are shown below and are stored in ../datasets/inc-pov-hlthins.sav as the variable NOINS ("no insurance").

(A) Create a stemplot of these data using an axis multiplier of �10.
(B) Redo the stemplot with double-split stem-values.
(C) Redo the stemplot with quintuple-split stem-values.
(D) Which plot is the most revealing? Interpret the most informative results.

STATE	% w/out insurance	STATE	% w/out insurance	STATE	% w/out insurance	STATE	% w/out insurance	STATE	% w/out insurance
Alabama	13.5	Hawaii	09.9	Michigan	11.4	North Carolina	16.6	Utah	13.4
Alaska	18.2	Idaho	17.3	Minnesota	08.5	North Dakota	11.0	Vermont	10.5
Arizona	17.0	Illinois	14.2	Mississippi	17.2	Ohio	11.8	Virginia	13.6
Arkansas	16.7	Indiana	13.7	Missouri	11.7	Oklahoma	19.2	Washington	14.2
California	18.4	Iowa	10.1	Montana	17.9	Oregon	16.1	West Virginia	15.9
Colorado	16.8	Kansas	10.8	Nebraska	11.0	Pennsylvania	11.5	Wisconsin	10.4
Connecticut	10.9	Kentucky	13.9	Nevada	19.1	Rhode Island	10.5	Wyoming	15.9
Delaware	11.8	Louisiana	18.8	New Hampshire	10.6	South Carolina	13.8
Dist. of Columbia	13.5	Maine	10.6	New Jersey	14.4	South Dakota	11.9
Florida	18.5	Maryland	14.0	New Mexico	21.4	Tennessee	12.7
Georgia	16.6	Massachusetts	10.8	New York	15.0	Texas	25.1

2.11 UNICEF low birth weight data (UNICEF). A weight at birth of less than 2,500 grams (about 5.5 pounds) qualifies as "low birth weight" according to standard conventions. Low birth-weight rates (per 100 births) for the year 1991 for 109 countries are stored in unicef.sav (UNICEF & Grant, J. P, 1991; Pagano and Gauvreau, 1993, p. 55). There are 129 records in the data file, but 20 values are missing; there are n = 109 valid data points.

(A) Download unicef.sav and create a stemplot of the low birth weight rates.
(B) Determine the low birth weight rate for the US. (Sort the data by country and find the value for the USA. The menu commands are Data > Sort cases > Sort by.)
(C) Create a frequency table for low birth weights. Where does the USA stand in this table?

2.12 Growth in the U.S. Hispanic population (PER-HISP). The 2000 census documented rapid growth of the Hispanic population in the United States. The percent of residents in the 50 United States who identified themselves in the 2000 census as Spanish, Hispanic, or Latino is shown below and is stored in per-hisp.sav (2000 US Census).

(A) Create a stemplot of these data using single stem-values. [Use an axis multiplier of �10.]
(B) Create a stemplot using double-split stem-values.
(C) Which of the plots do you prefer?

STATE	PERCENT	STATE	PERCENT	STATE	PERCENT
Alabama	1.5	Louisiana	2.4	Ohio	1.9
Alaska	4.1	Maine	.7	Oklahoma	5.2
Arizona	25.3	Maryland	4.3	Oregon	8.0
Arkansas	2.8	Massachusetts	6.8	Pennsylvania	3.2
California	32.4	Michigan	3.3	Rhode Island	8.7
Colorado	17.1	Minnesota	2.9	South Carolina	2.4
Connecticut	9.4	Mississippi	1.3	South Dakota	1.4
Delaware	4.8	Missouri	2.1	Tennessee	2.0
Florida	16.8	Montana	2.0	Texas	32.0
Georgia	5.3	Nebraska	5.5	Utah	9.0
Hawaii	7.2	Nevada	19.7	Vermont	0.9
Idaho	7.9	New Hampshire	1.7	Virginia	4.7
Illinois	10.7	New Jersey	13.3	Washington	7.2
Indiana	3.5	New Mexico	42.1	West Virginia	0.7
Iowa	2.8	New York	15.1	Wisconsin	3.6
Kansas	7.0	North Carolina	4.7	Wyoming	6.4
Kentucky	1.5	North Dakota	1.2

2.13 East Boston Respiratory Disease Survey (FEV). Download fev.sav and open the file in SPSS. Create a stemplot of the AGE variable (Analyze > Descriptive Statistics > Explore). Go to the output window and navigate to the stem-and-leaf plot. Notice that each leaf in the plot represents 2 cases. Also notice frequencies are reported to the left of the plot. Describe the shape, location, and spread of the age distribution.

2.14 Student weights (BODY-WEIGHT). The data set body-weight.sav contains weights of 53 students (in pounds). Download the dataset and plot the results as a stemplot. Describe the distribution. (Data from presentation slides prepared by J. Mays for Moore's text, 2004, Chapter 2).

Keys may or may not be linked

Key to Odd Numbered Exercises Key to Even Numbered Exercises