Some Statistical Basics

Before Data are Analyzed
• Study Design • Data Collection
Descriptive Statistics
Basic Statistical Inference
• Two Traditional Forms of Inference • Parameters and Statistics • Estimation • Hypothesis Testing • Power & Sample Size
Reporting Results
• Narrative Summary • How to Report Statistics
References

To analyze and interpret data, one must first understand fundamental statistical principals. Statistical topics are normally covered in introductory courses and texts, and cannot be given full justice in this brief chapter. However, a brief review of some principals may prove helpful.

Before Data are Analyzed

Study Design

When analyzing data, one must keep clearly in mind the question that prompted the research in the first place. The research question must be articulated clearly, concisely, and accurately. It must be enlightened.

Once the research question has been defined, a study is designed specifically to answer it. This is a the element in determining study success. Some study design features to consider are:

How will the study outcome be measured? Will measurements be objective (so that things are observed as they are without falsifying observations to accord with some preconceived world view)? Will measurement be reliable (so that observations can be consistently repeated)?
How will relations between factors be quantified? What parameter will be estimated?
How large a sample will be needed to ensure a sufficiently precise answer?
Will the study be experimental or nonexperimental? (Experimental studies entail an intervention.)
If the study is experimental, what type of control group will be used? Will the intervention be assigned randomly? Will subjects be blinded to treatment type?
If the study is nonexperimental, will observations be cross-sectional or longitudinal?
If the study is nonexperimental, will data be prospective or retrospective? Will the sample be cross-sectional, cohort, or case-control?

These and other questions must be addressed well before collecting data. An introduction to study design can be found by clicking here.

Data Collection

Consider your data source carefully. Sources of data include medical record abstraction, questionnaire, physical exam, biospecimens, environmental sampling, direct examination, etc. The data collection form ("instrument") must be carefully calibrated, tested, and maintained. If using a questionnaire, questions must be simple, direct, non-ambiguous, and non-leading. To encourage accuracy and compliance, survey questionnaires should be brief. When asking questions, nothing should be taken for granted.

The study protocol must be documented. How will the population be sampled? How will you deal with subjects who refuse to participate or are lost to follow-up? Criteria for managing missing and messy data should be discussed before problems are encountered. Once data are collected, how will you prevent data processing errors? Who will be responsible for entering, cleaning, and documenting the data? Who is going to back-up data? Seemingly mundane elements of data processing must be worked out in advance of the study.

Descriptive Statistics

Reasonable analyses come only after a good description is established. The type of description appropriate to an analysis depends on the nature of data. At its simplest, qualitative (categorical) data requires counts, proportions, rates, and ratios. With quantitative (continuous) data, distributional shape, location, and spread must be described.

The shape of a distribution refers to the configuration points when plotted. Useful graphs include histogram, stem-and-leaf plot, dot plot, or boxplot. When assessing shape, consider the data's symmetry, modality, and kurtosis.

The location of a distribution is summarized by its center. The most common statistical measures of central location are the mean, median, and mode.

The spread of a distribution's refers to its dispersion (variability) around its center. The most common summary measures of spread are the standard deviation, interquartile range, and range.

We are also often interested in describing associations between variables. Association refers to the degree to which values "go together." Associations may be positive, negative, and neutral. The measure of association well vary depending on the nature of the data. Examples of associational measures include mean difference (paired and independent), regression coefficients, and risk ratios.

Basic Statistical Inference

Two Traditional Forms of Statistical Inference

Statistical inference is the act of generalizing from a sample to a population with calculated degree of certainty. The importance of inference during data analysis is difficult to overstate. "for everyone who does habitually attempt the difficult task of making sense of figures is, in fact, essaying a logical process of the kind we call inductive, in that he is attempting to draw inferences from the particular to the general; or, as we more usually say in statistics, from the sample to population" (Fisher, 1935, p. 39).

The two traditional forms of statistical inference are estimation and significance testing. Estimation uses confidence intervals to help predict the a possible location of a parameter. Significance testing provides a statistic called the P-value, which is "a rational and well-defined measure of reluctance to accept the hypotheses they test" (Fisher 1973, p. 47).

As an example, an epidemiologists may want to learn about the prevalence of a condition -- smoking for instance -- based on the proportion of people who smoke in a sample. In a given sample, the final inference may be "25% of the population smokes" (point estimation). Alternatively, the inference may take the form of a confidence interval that is from "20% to 30%" (interval estimation). Finally, the epidemiologist might simply want to significance test whether smoking rates have changed over time, assuming that the prevalence was 30% to start with (the value of the parameter under the hypothesis to be tested and is now 25% (significance testing).

Whether one uses estimation or significance testing depends on the nature of the inference. When "amount" is important (as it nearly always is), estimation is the preferred method of inference. However, sometimes a categorical answer to a question is needed. Testing is appropriate under such circumstances.

Note: Addition forms of statistical inference are possible, e.g., likelihood ratios and Bayesian methods. Coverage of likelihood ratios and Bayesian methods are beyond the scope of this brief introduction.

Parameters and Estimates

Regardless of the inferential method used, it is important to keep clearly in mind the distinction between the parameters being inferred and the estimates used to infer them. Although the two are related, they are not interchangeable.

Parameters are statistical summaries (e.g., a mean difference) that describe something about the population; the population may be real but in questions of causality are more often hypothetical. Estimates are statistical summaries (e.g., the sample mean difference) that describe something about the sample; the sample is the data.
The exact value of the parameter is never fully known. In contrast, the value of the estimate is calculated from the data, i.e., known after the study has been completed.
Parameters are numeric constants. Estimates are to be thought of as random variables.

Statisticians use different symbols to represent estimators and population parameters. For example, the symbol "p hat" is used to represent a sample proportion (the estimate). In contrast, p may be used to represent the parameter ("the population proportion").

Estimation

There are two forms of estimation: point estimation and interval estimation. Point estimation provides a single point that is most likely to represent the parameter. For example, a sample proportion (p^) is the point estimator of population proportion (p). Interval estimation provides a interval that has a calculated likelihood of capturing the parameter. For example, a 95% confidence interval for population p will capture this parameter 95% of the time. That is, if, we independently repeated the study an infinite number of times, 95% of our calculated intervals would capture the parameter and 5% would fail to capture the parameter. However, for any given confidence interval, the parameter is or isn't captured. A certain amount of random uncertainty is an inevitable when working with empirical data. The confidence interval helps quantify this random uncertainty.

Significance Testing

So what of significance testing? First, we must note that there exists considerable misunderstanding about this method. In reference to the misunderstanding, we acknowledge two competing and sometimes contradictory methods: (a) significance testing and (b) hypothesis testing. Significance testing, as described by R. A. Fisher, provides a P-value that is a flexible inductive measure that assesses the credibility to the hypothesis being tested. In contrast, hypothesis testing, as described by Neyman and Pearson, provides decision rules about a null and alternative hypothesis. The extent to which these views arereconcilable is a matter of opinion that goes well beyond the scope of this modest introduction. Interested readers wishing to learn more about this controversy are referred to Lehmann (1993), Goodman (1993), and Bellhouse (1993). For now, let us simply note that both significance testing and hypothesis testing are misunderstood. The key statistic in significance testing is the P-value. For an introduction to the interpretation of P-values, click here.

Reporting Results

Narrative Summary of Results

Abelson, in his excellent book Statistics as Principled Argument (1995), suggests that the presentation of statistical results importantly entails rhetoric. The virtues of a good statistician, therefore, involve not only the skills of a good detective, but also the skills of a good storyteller. As a good story teller, it is essential to argue flexibly and in detail for a particular case. Data analysis should not be pointlessly formal. Rather, it should make an interesting claim by telling a tale that an informed audience will care about, doing so through an intelligent interpretation of data.

How To Report Statistics

Reporting and presenting results are important parts of a statistician's job. In general, the statistician should always use judgement when reporting statistics, and always report findings in a way this is consistent with what he or she wishes to learn. With this in mind, here are some guidelines for reporting statistics:

"Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results. When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals). Avoid sole reliance on statistical hypothesis testing and p values for they fail to convey important quantitative information [-- a p value by itself is seldom acceptable] . . . Give numbers of observations. . . . Specify any general-use computer programs used." (International Committee, 1988; Bailar & Mostellar, 1988).

The number of decimal places reported in final statistics is contingent on the precision of the data. Precise data warrant many decimal places; imprecise data do not. For example, an averages age in adults need be reported to only one decimal place (e.g., 68.1 years), not four (e.g., 68.1276 years). With this said, here are rules-of-thumb to keep in mind when reporting results.

For summary statistics (e.g., means, standard deviations), report one digit more than was present in the raw data. For example, if age is recorded to the nearest whole year, report the mean age to the nearest tenth of a year (e.g., mean = 54.3 years).
For percentages, the nearest whole percent (e.g., 25%) is usually adequate (APA, 1994), although many journals prefer percentages to the nearest tenth of a percent (e.g., 25.4%).
For test statistics, such as chi-square statistics, t statistics, and F statistics, use two decimal place accuracy (APA, 1994, p. 104). For example, report t statistic = 2.56.
For p values, two significant digits will do (Bailar & Mosteller, 1988). For example, report p = 0.0062. Notice that leading zeros do not count as significant digits.

Odds ratios and relative risks should be reported to one decimal place accuracy (e.g., OR = 3.1, not 3.11).
Do not use leading zeros before a decimal point when the number cannot exceed 1 (APA, 1994, p. 104). For example, report a = .05. Do use leading zeros before a decimal point when the number can be greater than 1. For example, report mean serum creatinine level = 0.973 mg/dl.
Always report units of measure. For example, mean serum creatinine = 0.973 mg/dl.
Statistics in text should include sufficient information to permit the reader to corroborate the analysis (APA, 1994, p. 112; Bailar & Mosteller, 1988).
Each journal has its own reporting standards. For example, San Jose State University requires APA Style (1994) whereas the American Journal of Public Health requires the Uniform Biomedical Style (International Committee, 1988).

References

Abelson R. P. (1995). Statistics as Principled Argument. Hillsdale, NJ: Lawrence Erlbaum Associates.

American Psychological Association [APA]. (1994). Publication Manual (4th ed.). Washington, DC: Author.

Bailar, J. C. & Mosteller, F. (1988). Guidelines for statistical reporting in articles for medical journals. Annals of Internal Medicine, 108, 266 - 273.

Bellhouse, D. R. (1993). Invited commentary: p values, hypothesis tests and likelihood. American Journal of Epidemiology, 137, 497 - 499.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997 - 1003.

Dallal, G. E. (1997). Sample Size Calculations Simplified. http://www.tufts.edu/~gdallal/SIZE.HTM

Dallal, G. E. (1997). Some Aspects of Study Design. http://www.tufts.edu/~gdallal/STUDY.HTM

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 39 - 54.

Fisher, R. (1973). Statistical Methods and Scientific Inference. (3^rd ed.). New York: Macmillan.

Goodman, S. N. (1993). P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American Journal of Epidemiology, 137, 485 - 496.

International Committee of Medical Journal Editors [International Committee]. (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108: 258 - 265.

Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? Journal of the American Statistical Association, 88, 1242 - 1249.

Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100 - 116.