Continuous Outcome, Single Group

Background | Descriptive Statistics | Confidence Interval | p Value| Sample Size and Precision | Exercises  

Background

This chapter teaches you to analyze a continuous outcome from a single group. The term continuous outcome as used here denotes any quantitative measure, including integer, ratio, and ordinal measurements. 

No active control group is present. Thus, if comparisons are to be made, they must be in relation to an  external "norm" or historical data.

Illustrative data: Data in the file ONEGRP.ZIP represent body weights of 18 diabetics expressed as a percentage of ideal. Thus, a value of 100 represents ideal body weight, a value of 120 represents 120% of ideal body weight (i.e., 20% overweight), and so on (Pagano & Gauvreau, 1993, p. 208). Data are {107, 119, 99, 114, 120, 104, 88, 114, 124, 116, 101, 121, 152, 100, 125, 114, 95, 117}. A stem-and-leaf plot of the distribution is:

| 8|8
| 9|59
|10|0147
|11|444679
|12|0145
|13|
|14|
|15|2
% of ideal body weight(x10)

The plot reveals that all but one data point lies between 88 and 125 (distributional spread). The center of the distribution is around 110 (central location). The distribution has one high outside value (152). Other than this outside value, data seem to have a negative skew (a tail toward the negative values). As the famous NY Yankee catcher Yogi Berra is rumored to have said, "You can observe a lot by watching."

Descriptive Statistics

Each Epi Info session begins by READing (opening) the data set:

EPI6> READ ONEGRP

A one-variable MEANS command is issued to describe the data:

EPI6> MEANS PERIDEAL

The following summary statistics are provided:

      Total        Sum       Mean   Variance    Std Dev    Std Err
         18       2030    112.778    208.065     14.424      3.400

    Minimum     25%ile     Median     75%ile    Maximum       Mode
     88.000    101.000    114.000    120.000    152.000    114.000

Comments:
(1) Always report the distribution's mean and standard deviation. The sample size (reported under Total) should also be reported.
(2) Although Epi Info reports summary statistics to three decimal places, fewer decimals should be reported to avoid giving a false impression of precision. A rule-of-thumb is to report summary statistics with one decimal value above that of the initial measurement. For example, since the variable is measured to the nearest whole unit, we would report summary statistics to one decimal place accuracy, e.g., mean = 112.8, standard deviation =14.4 (n = 18).
(3) It is often useful to report a five-point summary of the distribution comprising the distribution's minimum, 25th percentile, median, 75th percentile, and maximum (e.g., 88, 101, 114, 120, 152).
(4) The mode is seldom of interest with small data sets.

Confidence Interval for

The sample mean is the point estimator of expected value . A (1 - a)100% confidence interval for is calculated with the formula:

MEAN (tn-1,1-a/2)(Std Err)

where (tn-1,1-a/2) represents the (1 - a/2) percentile of a t distribution with n - 1 degrees (click here). The 95% confidence interval for for the illustrative data = 112.778 (t17,1-.05/2)(14.424/sqrt(18)) = 112.778 (2.11)(3.400) = 112.778 7.174 = (105.6, 120.0).

p Value (One-Sample t Test)

A one-sample t statistic is used to test H0: = 0, where 0 represents the expected value under the null hypothesis. For our illustrative example let us ask whether differs from 100, since 100 represents 100% of ideal body weight. Therefore, H0: = 100.

The one-sample t statistic is:

tstat = (MEAN -0) / (Std Err)

Under the null hypothesis this statistic has a t distribution with n - 1 degrees of freedom. For the illustrative data, tstat = (112.778 - 100) / 3.400 = 3.76 with df = 18 - 1 = 17. The two-sided p value is the area under the curve in the tails of the t17 distribution.

To have Epi Info calculate one-sample t statistics issue the commands:

EPI6> DEFINE NULLVAL <###.#>
EPI6> LET NULLVAL = <num>
EPI6> DELTA = <varname> - NULLVAL
EPI6> MEANS DELTA

The first two lines of this program set the null value for the test. The next line computes differences between observed values and the null value. The last line calculates the t statistics and p value.

For the illustrative example the following commands are issued:

EPI6> DEFINE NULLVAL ###
EPI6> LET NULLVAL = 100
EPI6> DELTA = PERIDEAL - NULLVAL
EPI6> MEANS DELTA

Relevant output is:

Student's "t", testing whether mean differs from zero.
T statistic = 3.758, df = 17 p-value = 0.00190

Sample Size and Precision

Let d represent the margin of error (approximately half the length of the 95% confidence interval). To achieve a study with precision d use a sample of size:

n = (4s2)/d2

 
where s represents the standard deviation of the variable. For example, to achieve d = 5 for a variable with standard deviation s = 15, n = (4)(152)/52 = 36.

Comment: One of the more difficult aspects of using this method is coming up with a reasonable estimate for s. Such estimates may come from a pilot studies or from previous experience.

Exercises

(1) UNICEF.ZIP: Low Birth Weight Rates Worldwide (Pagano and Gauvreau, 1993, p. 55; United Nations Children's Fund, 1991). A weight at birth of less than 2,500 grams -- about 5.5 pounds -- is considered a low birth weight. The rate of low birth-weights in a county is an index of maternal and child health. The variable LOWBW in UNICEF.REC contains low birth-weight rates per 100 births for the year 1991 from various countries.
(A) Sort these data in low birth-weight rate order by issuing the command SORT LOWBW. Then list the data to determine which country demonstrates the lowest low birth-weight rate. Also determine the country with the highest low birth-weight rate.
(B) What is the low birth weight rate in the United States? The easiest way to find this information is to sort data in alphabetic order by country (SORT COUNTRY) and then LIST the data to find the record for the United States. Where does the U.S. rank among other countries? (Issue a MEANS LOWBW command and look up the cumulative frequency of the U.S.'s rate. This will represent its approximate percentile rank.)
(C) Plot the data in the form of a histogram. (Comment: The data set is large enough to make grouping it into class intervals unnecessary.) In words, describe the distribution.
(D) Compute and report summary statistics for LOWBW.
(E) Assuming these data represent a random sample of low birth weight rates worldwide, calculate a 95% confidence interval for the expected low birth weight rate.

(2) SEIZURE.ZIP: Seizures Following Bacterial Meningitis (Pagano and Gauvreau, p. 54, 1993; Pomeroy et al., 1990). A study investigated the long-term prognosis of children following bacterial meningitis. This study determined the number of months between the onset of meningitis and subsequent seizures as being: 0.1, 0.25, 0.5, 4, 12, 12, 24, 24, 31, 36, 42, 55, 96.
(A) Create data file with these data. Call the data set SEIZURE.REC. Call the variable MONTHS.
(B) Report the five-point summary for these data (MEANS MONTHS).
(C) Group data into class intervals of width 20. Then construct a frequency table based on these groupings.
(D) Construct a histogram based on the 20-unit class intervals.
(E) Previous studies suggest a mean time to seizure of 12 months. Using these data, test whether this mean has changed. In completing this analysis, list the null and alternative hypotheses, report the t statistic, its degrees of freedom, and p value. Let a = .05. State your conclusion.

(3) SERZINC.ZIP: Zinc Levels in 15- to 17-year-old Males (Pagano and Gauvreau, pp. 32 and 55). The data set SERZINC.REC contains serum zinc values (mcg/dl) for 462 boys between the ages of 15 and 17. Download and unzip this data set and then:
(A) compute its mean, standard deviation, and sample size.
(B) Group data into 20 unit class interval widths and then compile a frequency table with this grouped data. Then, create a HISTOGRAM of the grouped data.
(C) Calculate a 95% confidence interval for the population mean.
(D) Test whether the population mean is significantly different from 85 mcg/dl? Let a = 0.05. (List all elements of the hypothesis test.)

Key

References

Pagano, M. & Gauvreau, K. (1993). Principles of Biostatistics. Belmont, CA: Duxbury Press.

Pomeroy, S. L., Holmes, S. J., Dodge, P. R., and Feigin, R. D. (1990). Seizures and other neurolotic sequelae of bacterial meningitis in children. New England Journal of Medicine, 323, 1651-1656.

Saudek, C. D., Selam, J. L., Pitt, H. A., Waxman, K., Rubio, M. Jeandidier, N., Turner, D., Fishcell, R. E., and Charles, M. A. (1989). A preliminary trial of the programmable implantable mediation system for insulin delivery. New England Journal of Medicine, 321, 574-579.

United Nations Children's Fund. (1991). The State of the World's Children, 1991. New York: Oxford University Press.