Last update: 5/17/00
Selection of a valid statistical technique depends on a clear understanding of the research question being asked, the way data are collected (sampling methods), and the way variables are measured.
To start, one must understand the nature of the outcome variable. In general, this variable is either:
You will then want to look at the ways groups were sampled. In a fundamental sense, data may be based on:
This, then, provides our framework for selecting an approach. This chart summarizes the most common methods:
Outcome Variable |
Sample Type | Predictor Variable | Graphs (examples) | Summary stats | Main Parameter | Most common test |
Continuous | Single | None | histogram, stem-and-leaf boxplot | mean, sd, 5-point summary | mean | one-sample t test |
Continuous | Paired | None | same as above, directed toward paired differences | same as above, directed toward "DELTA" | mean difference (paired) | paired sample t test |
Continuous | Independent | Categorical | side-by-side boxplots or quartile plots | group means and standard deviations | mean difference (independent) | independent t test or ANOVA |
Continuous | Independent | Continuous | scatter plot | N/A | Correlation or Regression coefficients |
either ANOVA or t test |
Categorical | Single | None | Usually unnecessary | numerator and denominator counts | proportions | Binomial |
Categorical | Paired | None | "" | discordancy rates | odds ratios | McNemar's |
Categorical | Independent | Categorical | "" | incidences (cohort) or exposure proportions (case-cntl) | relative risks (cohort) or odds ratios (case-cntl) | Chi-square or Fisher's |
Categorical | Independent | Continuous | "" | odds ratio | Logistic regression |
For example, if we want to study the relationship between cholesterol (continuous outcome) and type A and B behavior (categorical predictor), we estimate the independent mean difference and test whether this is significant using an independent t test. If we want to study heart attack risk (categorical outcome) and type A and B behavoir (categorical predictor) we compare the incidence of heart attacks in the two groups in the form of a relative risk and test the relationship using a chi-square method. If we want to study the relationship between systolic blood pressure (continuous outcome) and age (continuous preditor), we estimate the correlation between these factors and estimate the average change in blood pressure per each year of age using linear regression. (And so on.)
OK. A few parting shots:
(1) Fill in the blank: With a continuous outcome, descriptive statistics are based on sums and averages. With a categorical outcome, descriptive statistics are based on _________________ and ___________________.
ANS: counts and proportions
(2) What is the main test used to determine statistical significance when testing a continuous dependent variable from two independent gropus?
ANS: An independent t test or, alternatively, ANOVA
(3) What type of procedure is quantify the relationship between a continuous dependent variable and continuous independent variable?
ANS: Regression or correlation can be used, depending on whether a true independent variable is present (regression) and whether one want to predict the average change in Y per unit X (regression) or correlational "fit" (correlation).
(4) What type of test is normally used to determine whether there is a statistically significance relationship between a categorical dependent variable and categorical independent variable?
ANS: A chi-square test.
(5) List the (two-sided) null hypotheses used by each of tests addressed in (2) - (4), above.
ANS:
For question (2), the Independent t test H0: µ1 = µ2
For question (3), regression test H0: beta1 = 0; correlation test H0: "rho" = 0
For question (4), the chi-square test: H0: no association between row and column variables
(6) List the assumptions required by each of the above tests.
ANS: Using short descriptors,
Independent t test / ANOVA: Independence, Normality, Equal Variance
Regression test: Linearity, Independence, Normality, Equal Variance
Chi-square test: Independence, Expected Values >= 5
For each study described below, please:
(A) Identify the outcome variable and determine whether it is continuous or categorical.
(B) Determine whether the sampling is single sample, paired sample, or independent samples.
(C) If samples are independent, identify the independent variable and determine whether it is continuous or categorical.
(D) Identify appropriate descriptive and exploratory statistical methods for the problem.
(E) Identify the parameter being estimated and appropriate estimation methods.
(F) List the null and alternative hypotheses, and the name of the most common method used to test the problem.
(G) Identify factors that you would need to determine or assume before you could determine the sample size requirements of
such a study.
Dallal, G. E. (1998). Some Aspects of Study Design. http://www.tufts.edu/~gdallal/STUDY.HTM.
Gerstman, B. B. (1998). Epidemiology Kept Simple. New York: John Wiley & Sons.
Tyler, C. W. Jr. & Last, J. M. (1998). Epidemiology. In: Maxcy-Rosenau-Last Public Health & Preventive Medicine. R. B. Wallis (Ed.) Stamford, CN: Appleton & Lange.