Ref - Some Statistical Basics, *Data Analysis with Epi Info*.

- Research question ----> Study Design ---> Study Protocol---> Measurements (assigning numbers according to prior set rules) ----> Operationalized variables ----> Data collection form (e.g., survey questionnaire, other data form)
- Type of variables
- Categorical (qualitative, nominal) e.g., SEX
- Ordinal (ranked), e.g., Leikert scales
- Continuous (quantitative, scale), e.g., AGE

- When doing survey research, avoid selection bias by using chance mechanisms to select your sample
- Avoid information bias by measuring what you purport to measure (i.e., measurement must be precise and valid, or you are wasting your time or you are an activist for a cause)

- Explore the distribution of each variable:
- Shape (e.g., symmetry, kurtosis, modality)
- Central location
- Spread (dispersion)
- Graphs
- Categorical data - bar graphs or pie charts
- Continuous data
- Histograms (moderate to large data sets)
- Stem-&-leaf (small to moderate data sets)

e.g., Data are: {93, 82, 84, 71}

`|9|3
|8|24
|7|1`

More info on stem-and-leaf plots: http://www.sjsu.edu/faculty/gerstman/StatPrimer/Freq.PDF

- Boxplots. More info at http://www.sjsu.edu/faculty/gerstman/StatPrimer/Sumstats.PDF
- Bivariate: scatter plots
- Summary stats
- Central location
- Mean = arithmetic average
- Median = mid-point of ordered array
- Mode = most common value (seldom used)
- Spread
- Sum of squares = sum of squared deviations around the mean
- Variance = average sum of squares
- Standard deviation = square root of variance
- Interquartile range = Q3 - Q1 (robust measure of spread)
- Coefficient of variation (unit independent measure of standard deviation; seldom used)
- Other points on the distribution (e.g., quartiles, percentiles, z-scores - not covered in HS267)

- Parameters vs. statistics
- Parameters - from population, hypothetical/unobserved ("counterfactual"), numeric constants, notation - Greek (e.g., "mu") or hatless (e.g., RR)
- Statistics - fro m sample, calculated/observed, random variables, notation - Roman or with hats (e.g., "x bar")
- Estimation - predicting most likely notation of parameter
- Point estimate (e.g., "x bar" estimates "mu")
- Interval estimate (e.g., 95% confidence interval for mu)
- Hypothesis testing
- Frequently used, often misunderstood; do not rely on as sole source of info.
- Two-by-two table of correct retention, incorrect rejection (type I error), correct rejection, incorrect retention (type II error)
- alpha = Pr(type I error)
- beta = Pr(type II error)
- power = 1 - beta
- Goal: minimize alpha, maximize power
- Retention of the null hypothesis does not imply it is true!

- Important! - see chapter for specifics
- Use APA reporting style (make free use of manual)

- Understand research question and how this translates into study design, measurements, and parameter estimation
- Describe data - graphs and summary stats
- Estimation - point and interval
- Hypothesis test
- Narrative Summary - telling a meaningful story
- Power and sample size (esp. important when results are insignificant)