Before Data are Analyzed

• Study Design • Data Collection

Descriptive Statistics

Basic Statistical Inference

• Two Traditional Forms of Inference • Parameters and Statistics • Estimation • Hypothesis Testing • Power & Sample Size

Reporting Results

• Narrative Summary • How to Report Statistics

References

To analyze and interpret data, one must first understand fundamental statistical principals. Statistical topics are normally covered in introductory courses and texts, and cannot be given full justice in this brief chapter. However, a brief review of some principals may prove helpful.

When analyzing data, one must keep clearly in mind the question that prompted the research in the first place. The research question must be articulated clearly, concisely, and accurately. It must be enlightened.

Once the research question has been defined, a study is designed specifically to answer it. This is a the element in determining study success. Some study design features to consider are:

- How will the study outcome be measured? Will measurements be objective (so that things are observed as they are without falsifying observations to accord with some preconceived world view)? Will measurement be reliable (so that observations can be consistently repeated)?
- How will relations between factors be quantified? What parameter will be estimated?
- How large a sample will be needed to ensure a sufficiently precise answer?
- Will the study be experimental or nonexperimental? (Experimental studies entail an intervention.)
- If the study is experimental, what type of control group will be used?
Will the intervention be
- If the study is nonexperimental, will observations be cross-sectional or longitudinal?
- If the study is nonexperimental, will data be prospective or retrospective? Will the sample be cross-sectional, cohort, or case-control?

These and other questions must be addressed well before collecting data. An introduction to study design can be found by clicking here.

Consider your data source carefully. Sources of data include
medical record
abstraction, questionnaire, physical exam, biospecimens, environmental
sampling, direct examination, etc. The data collection form
("instrument") must be carefully calibrated, tested, and maintained. If
using a questionnaire, questions must be simple, direct,
non-ambiguous, and non-leading. To encourage accuracy and compliance,
survey questionnaires should be brief. When asking
questions, *nothing should be taken for granted*.

The **study protocol **must be documented. How will the population be sampled? How will you deal with subjects who refuse to
participate or are lost to follow-up? Criteria for managing missing and messy data should be discussed *before *problems
are
encountered. Once data are collected, how will you prevent data
processing errors? Who will be responsible for entering, cleaning, and
documenting the data? Who is going to back-up data? Seemingly mundane
elements of data processing must be worked out in advance
of the study.

Reasonable analyses come only *after *a good description is
established. The type of description appropriate to an analysis depends
on
the nature of data. At its simplest, qualitative (categorical) data
requires counts, proportions, rates, and ratios. With quantitative
(continuous) data, distributional shape, location, and spread must be
described.

The shape of a distribution refers to the configuration points when plotted. Useful graphs include *histogram*, *stem-and-leaf plot*, *dot
plot, *or *boxplot*. When assessing shape, consider the data's symmetry, modality, and kurtosis.

The location of a distribution is summarized by its center. The most common statistical measures of central location are the mean, median, and mode.

The spread of a distribution's refers to its dispersion (variability) around its center. The most common summary measures of spread are the standard deviation, interquartile range, and range.

We are also often interested in describing associations between variables. Association refers to the degree to which values "go together." Associations may be positive, negative, and neutral. The measure of association well vary depending on the nature of the data. Examples of associational measures include mean difference (paired and independent), regression coefficients, and risk ratios.

Statistical inference is the act of generalizing from a sample to a population with calculated degree of certainty. The importance of inference during data analysis is difficult to overstate. "for everyone who does habitually attempt the difficult task of making sense of figures is, in fact, essaying a logical process of the kind we call inductive, in that he is attempting to draw inferences from the particular to the general; or, as we more usually say in statistics, from the sample to population" (Fisher, 1935, p. 39).

The two traditional forms of statistical inference are estimation and significance testing. Estimation uses confidence intervals to help predict the a possible location of a parameter. Significance testing provides a statistic called the P-value, which is "a rational and well-defined measure of reluctance to accept the hypotheses they test" (Fisher 1973, p. 47).

As an example, an epidemiologists may want to learn about the prevalence of a condition -- smoking for instance -- based on the proportion of people who smoke in a sample. In a given sample, the final inference may be "25% of the population smokes" (Whether one uses estimation or significance testing depends on the nature of the inference. When "amount" is important (as it nearly always is), estimation is the preferred method of inference. However, sometimes a categorical answer to a question is needed. Testing is appropriate under such circumstances.

Note: Addition forms of statistical inference are possible, e.g., likelihood ratios and Bayesian methods. Coverage of likelihood ratios and Bayesian methods are beyond the scope of this brief introduction.

Regardless of the inferential method used, it is important to keep clearly in mind the distinction between the *parameters *being inferred
and the *estimates *used to infer them. Although the two are related, they are not interchangeable.

- Parameters are statistical summaries (e.g., a mean difference) that describe something about the
*population*; the population may be real but in questions of causality are more often hypothetical. Estimates are statistical summaries (e.g., the sample mean difference) that describe something about the sample; the sample is the data. - The exact value of the parameter is never fully known. In contrast, the value of the estimate is calculated from the data, i.e., known after the study has been completed.
- Parameters are numeric constants. Estimates are to be thought of as random variables.

Statisticians use different symbols to represent estimators and population parameters. For example, the symbol "p hat" is used to
represent a sample proportion (the estimate). In contrast, *p* may be used to represent the parameter ("the population proportion").

There are two forms of estimation: point estimation and interval estimation. *Point estimation* provides a single point that is most likely to represent the parameter. For example, a sample proportion (*p*^) is the point estimator of
population proportion (*p*). *Interval estimation *provides a interval that has a calculated likelihood of capturing the parameter. For
example, a 95% confidence interval for population *p* will capture this parameter 95% of the time. That is, if, we
independently repeated the study an infinite number of times, 95% of our calculated intervals would capture the parameter and 5%
would fail to capture the parameter. However, for any given confidence interval, the parameter *is* or *isn't*
captured. A certain amount of random uncertainty is an inevitable when
working with empirical data. The confidence interval helps
quantify this random uncertainty.

So what of significance testing? First, we must note that there
exists considerable misunderstanding about this method. In
reference to the misunderstanding, we acknowledge two competing and sometimes
contradictory methods: (a) significance testing and (b) hypothesis testing. *Significance testing*, as
described by R. A. Fisher, provides a *P-*value that is a flexible inductive measure that assesses the
credibility to the hypothesis being tested. In contrast,
*hypothesis testing*, as described by Neyman and Pearson,
provides decision rules about a null and alternative hypothesis. The
extent to
which these views arereconcilable is a matter of opinion that goes
well beyond the scope of this modest introduction. Interested readers
wishing to learn more about this controversy are referred to Lehmann (1993),
Goodman (1993), and Bellhouse (1993). For now, let us
simply note that both significance testing and hypothesis testing are
misunderstood. The key statistic in significance testing is the P-value. For an introduction to the interpretation of P-values, click here.

Abelson, in his excellent book *Statistics as Principled Argument*
(1995), suggests that the presentation of statistical results
importantly
entails rhetoric. The virtues of a good statistician, therefore,
involve not only the skills of a good detective, but also the skills of
a
good storyteller. As a good story teller, it is essential to argue
flexibly and in detail for a particular case. Data analysis should *not *be
pointlessly formal. Rather, it should make an interesting claim by
telling a tale that an informed audience will care about, doing so
through an intelligent interpretation of data.

Reporting and presenting results are important parts of a statistician's job. In general, the statistician should *always use judgement
*when reporting statistics, and always report findings in a way
this is consistent with what he or she wishes to learn. With this in
mind,
here are some guidelines for reporting statistics:

- "Describe statistical methods with enough detail to enable
a knowledgeable reader with access to the original data to verify the
reported results. When possible, quantify findings and present them
with appropriate indicators of measurement error or uncertainty
(such as confidence intervals). Avoid sole reliance on statistical
hypothesis testing and
*p*values for they fail to convey important quantitative information [-- a*p*value by itself is seldom acceptable] . . . Give numbers of observations. . . . Specify any general-use computer programs used." (International Committee, 1988; Bailar & Mostellar, 1988). - The number of
*decimal places*reported in final statistics is contingent on the precision of the data. Precise data warrant many decimal places; imprecise data do not. For example, an averages age in adults need be reported to only one decimal place (e.g., 68.1 years),*not*four (e.g., 68.1276 years). With this said, here are rules-of-thumb to keep in mind when reporting results. - For summary statistics (e.g., means, standard deviations), report one digit more than was present in the raw data. For example, if age is recorded to the nearest whole year, report the mean age to the nearest tenth of a year (e.g., mean = 54.3 years).
- For percentages, the nearest whole percent (e.g., 25%) is usually adequate (APA, 1994), although many journals prefer percentages to the nearest tenth of a percent (e.g., 25.4%).
- For test statistics, such as chi-square statistics,
*t*statistics, and*F*statistics, use two decimal place accuracy (APA, 1994, p. 104). For example, report*t*statistic = 2.56. - For
*p*values, two significant digits will do (Bailar & Mosteller, 1988). For example, report*p*= 0.0062. Notice that leading zeros do*not*count as significant digits. - Odds ratios and relative risks should be reported to one decimal place accuracy (e.g.,
*OR*= 3.1, not 3.11). - Do
*not*use leading zeros before a decimal point when the number cannot exceed 1 (APA, 1994, p. 104). For example, report a = .05.*Do*use leading zeros before a decimal point when the number can be greater than 1. For example, report mean serum creatinine level = 0.973 mg/dl. - Always report units of measure. For example, mean serum creatinine = 0.973
*mg/dl*. - Statistics in text should include sufficient information to permit the reader to corroborate the analysis (APA, 1994, p. 112; Bailar & Mosteller, 1988).
- Each journal has its own reporting standards. For example, San Jose State University requires APA Style (1994) whereas the
*American Journal of Public Health*requires the Uniform Biomedical Style (International Committee, 1988).

Abelson R. P. (1995). *Statistics as Principled Argument*. Hillsdale, NJ: Lawrence Erlbaum Associates.

American Psychological Association [APA]. (1994). *Publication Manual *(4th ed.). Washington, DC: Author.

Bailar, J. C. & Mosteller, F. (1988). Guidelines for statistical reporting in articles for medical journals. *Annals of Internal Medicine*,
108, 266 - 273.

Bellhouse, D. R. (1993). Invited commentary: *p* values, hypothesis tests and likelihood. American *Journal of Epidemiology*, 137, 497 -
499.

Cohen, J. (1994). The earth is round (*p* < .05). *American Psychologist*, 49, 997 - 1003.

Dallal, G. E. (1997). *Sample Size Calculations Simplified.* http://www.tufts.edu/~gdallal/SIZE.HTM

Dallal, G. E. (1997). *Some Aspects of Study Design*. http://www.tufts.edu/~gdallal/STUDY.HTM

Fisher, R. A. (1935). The logic of inductive inference. *Journal of the Royal Statistical Society*, 98, 39 - 54.

Fisher, R. (1973). *Statistical Methods and Scientific Inference*. (3^{rd} ed.). New York: Macmillan.

Goodman, S. N. (1993). *P* values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate.
*American Journal of Epidemiology*, 137, 485 - 496.

International Committee of Medical Journal Editors [International Committee]. (1988). Uniform requirements for manuscripts
submitted to biomedical journals. *Annals of Internal Medicine*, 108: 258 - 265.

Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? *Journal of the American
Statistical Association*, 88, 1242 - 1249.

Tukey, J. W. (1991). The philosophy of multiple comparisons. *Statistical Science*, 6, 100 - 116.