Lecture Notes

Ref - Some Statistical Basics, Data Analysis with Epi Info.

(A) Before Data Are Collected

• Research question ----> Study Design ---> Study Protocol---> Measurements (assigning numbers according to prior set rules) ----> Operationalized variables ----> Data collection form (e.g., survey questionnaire, other data form)
• Type of variables
• Categorical (qualitative, nominal) e.g., SEX
• Ordinal (ranked), e.g., Leikert scales
• Continuous (quantitative, scale), e.g., AGE

(B) Data Collection

• When doing survey research, avoid selection bias by using chance mechanisms to select your sample
• Avoid information bias by measuring what you purport to measure (i.e., measurement must be precise and valid, or you are wasting your time or you are an activist for a cause)

(C) Descriptive Statistics

• Explore the distribution of each variable:
• Shape (e.g., symmetry, kurtosis, modality)
• Central location
• Graphs
• Categorical data - bar graphs or pie charts
• Continuous data
• Histograms (moderate to large data sets)
• Stem-&-leaf (small to moderate data sets)

e.g., Data are: {93, 82, 84, 71}

|9|3
|8|24
|7|1

• Summary stats
• Central location
• Mean = arithmetic average
• Median = mid-point of ordered array
• Mode = most common value (seldom used)
• Sum of squares = sum of squared deviations around the mean
• Variance = average sum of squares
• Standard deviation = square root of variance
• Interquartile range = Q3 - Q1 (robust measure of spread)
• Coefficient of variation (unit independent measure of standard deviation; seldom used)
• Other points on the distribution (e.g., quartiles, percentiles, z-scores - not covered in HS267)

(D) Inferential Statistics

• Parameters vs. statistics
• Parameters - from population, hypothetical/unobserved ("counterfactual"), numeric constants, notation - Greek (e.g., "mu") or hatless (e.g., RR)
• Statistics - fro m sample, calculated/observed, random variables, notation - Roman or with hats (e.g., "x bar")
• Estimation - predicting most likely notation of parameter
• Point estimate (e.g., "x bar" estimates "mu")
• Interval estimate (e.g., 95% confidence interval for mu)
• Hypothesis testing
• Frequently used, often misunderstood; do not rely on as sole source of info.
• Two-by-two table of correct retention, incorrect rejection (type I error), correct rejection, incorrect retention (type II error)
• alpha = Pr(type I error)
• beta = Pr(type II error)
• power = 1 - beta
• Goal: minimize alpha, maximize power
• Retention of the null hypothesis does not imply it is true!

(E) Reporting Results

• Important! - see chapter for specifics
• Use APA reporting style (make free use of manual)

(F) Approach Toward Data Analysis

• Understand research question and how this translates into study design, measurements, and parameter estimation
• Describe data - graphs and summary stats
• Estimation - point and interval
• Hypothesis test
• Narrative Summary - telling a meaningful story
• Power and sample size (esp. important when results are insignificant)