Lecture Notes
Ref - Some Statistical Basics, Data Analysis with Epi Info.
(A) Before Data Are Collected
- Research question ----> Study Design ---> Study Protocol---> Measurements (assigning numbers according
to prior set rules) ----> Operationalized variables ----> Data collection form (e.g., survey questionnaire,
other data form)
- Type of variables
- Categorical (qualitative, nominal) e.g., SEX
- Ordinal (ranked), e.g., Leikert scales
- Continuous (quantitative, scale), e.g., AGE
(B) Data Collection
- When doing survey research, avoid selection bias by using chance mechanisms to select your sample
- Avoid information bias by measuring what you purport to measure (i.e., measurement must be precise and
valid, or you are wasting your time or you are an activist for a cause)
(C) Descriptive Statistics
- Explore the distribution of each variable:
- Shape (e.g., symmetry, kurtosis, modality)
- Central location
- Spread (dispersion)
- Graphs
- Categorical data - bar graphs or pie charts
- Continuous data
- Histograms (moderate to large data sets)
- Stem-&-leaf (small to moderate data sets)
e.g., Data are: {93, 82, 84, 71}
|9|3
|8|24
|7|1
More info on stem-and-leaf plots: http://www.sjsu.edu/faculty/gerstman/StatPrimer/Freq.PDF
- Summary stats
- Central location
- Mean = arithmetic average
- Median = mid-point of ordered array
- Mode = most common value (seldom used)
- Spread
- Sum of squares = sum of squared deviations around the mean
- Variance = average sum of squares
- Standard deviation = square root of variance
- Interquartile range = Q3 - Q1 (robust measure of spread)
- Coefficient of variation (unit independent measure of standard deviation; seldom used)
- Other points on the distribution (e.g., quartiles, percentiles, z-scores - not covered in HS267)
(D) Inferential Statistics
- Parameters vs. statistics
- Parameters - from population, hypothetical/unobserved ("counterfactual"), numeric constants,
notation - Greek (e.g., "mu") or hatless (e.g., RR)
- Statistics - fro m sample, calculated/observed, random variables, notation - Roman or with hats (e.g.,
"x bar")
- Estimation - predicting most likely notation of parameter
- Point estimate (e.g., "x bar" estimates "mu")
- Interval estimate (e.g., 95% confidence interval for mu)
- Hypothesis testing
- Frequently used, often misunderstood; do not rely on as sole source of info.
- Two-by-two table of correct retention, incorrect rejection (type I error), correct rejection, incorrect
retention (type II error)
- alpha = Pr(type I error)
- beta = Pr(type II error)
- power = 1 - beta
- Goal: minimize alpha, maximize power
- Retention of the null hypothesis does not imply it is true!
(E) Reporting Results
- Important! - see chapter for specifics
- Use APA reporting style (make free use of manual)
(F) Approach Toward Data Analysis
- Understand research question and how this translates into study design, measurements, and parameter
estimation
- Describe data - graphs and summary stats
- Estimation - point and interval
- Hypothesis test
- Narrative Summary - telling a meaningful story
- Power and sample size (esp. important when results are insignificant)