1: Measurement & sampling  2/1/16

Review Questions

1. A famous statistician once called statistics “the servant of all sciences” (Neyman 1955). What do you think he meant by this statement?
2. How does measurement error differ from processing error?
3. What does the acronym GIGO stand for?
4. Define the following terms: sample, population, simple random sample, sampling fraction, variable, value, observation, measurement.
5. If statistics is not merely a compilation of computational techniques, what then is it?
6. Provide a synonym for categorical variable. Provide a synonym for quantitative variable
7. Describe the structure of a data table.
8. How does sampling with replacement differ from sampling without replacement?
9. What do we call the ratio of the sample size to population size?
10. All the information on a data collection form corresponds to [select best response]:
(a) a value (b) a variable (c) an observation
11. A column of data in a data table corresponds to: [select best response]:
(a) a value (b) a variable (c) an observation].
12. In a simple random sample, every value in the population has an _________ chance of being selected. [Select best response]:
(a) equal (b) unequal
13. I select the first 10 people on an alphabetized list. Explain why this is not an SRS.

Exercises

1.1 Cargo cult science. Read the passage below. Then describe in one page or less the meaning of “cargo cult science.”

The unorthodox scientist Richard Feynmann used the term Cargo Cult Science to refer to pseudoscientific practices that follow the superficial forms and precepts of science but miss the honest, self-critical element that is crucial to rigorous investigation. This term has it basis in a South Seas people that, during World War II, saw airplanes land with goods and materials. They inhabitants of the island wanted these deliveries to continue after the military had left, so they arranged to imitate things they saw associated with the cargo, like the runways lights (in the form of fires), a wooden hut for a “controller” who wore two wooden pieces on his head to emulate headphones, bamboo sticks to imitate antennas, and so on. With the Cargo culture in place, the island inhabitants awaiting airplanes to land. The form was right—on the surface things looked as they had before—but of course things no longer functioned as they had hoped. Airplanes full of cargo failed to bring goods and services.

1.2 Oral contraceptives and breast cancer. A study of 135,000 menopausal women between the ages of 50 and 70 found no difference in the incidence of breast cancer in women who had used oral contraceptives in their past and women who did not. What population is being studied in this investigation? Is the population restricted to the 135,000 study participants?

1.3 Hospital duration data (HDUR). A study that used discharge records of 30 patients from a university hospital in metropolitan Detroit found that 35% of the patients received antibiotics during their hospital stay. Describe the population that this study addresses. Describe the sample.

1.4 Body weight expressed as a percentage of ideal. A study of eighteen 35- to 44-year old male diabetics found that their mean body weight was 13% above the ideal weight. Describe the population and sample for this study.

1.5 Teaching effectiveness. College administrators are concerned about evaluating the effectiveness of classroom instruction. One method used for this purpose is to ask students to rate various facets of their classes. One question asks students to rate the overall effectiveness of their professor on a 5-point scale with 5 representing “highly effective” and 1 representing “highly ineffective.” What measurement scale was used to record this data? Does the measurement meet the criteria for objectivity? Does it meet the criteria for accuracy?

1.6 Dietary fat consumption. In studying dietary fat consumption, a prospective study asks study participants to keep daily logs of their dietary habits. In contrast, retrospective studies must rely on the recall of study participants to remember what and how much they ate. Which method of measuring dietary is more likely to achieve objective and accurate results? Explain you reasoning.

1.7 Duration of hospitalization. Below is a codebook for data on antibiotic use in patients at a community hospital in Pennsylvania. The first 25 observations from the data set are shown below the codebook.

(A) Classify each variable as either quantitative, ordinal, or categorical.
(B) What is the value of the
DUR variable for observation 4?
(C) What is the value of the
AGE variable for observation 24?

Codebook for HDUR dataset

Variable   Description
---------- -----------------------------------
DUR        Duration of hospitalization (days)
AGE        Age (years)
SEX        1 = male 2 = female
TEMP       Body temperature (degrees Fahrenheit)
WBC        White blood cells per 100 ml blood
AB         Antibiotic use: 1 = yes 2 = no
CULT       Blood culture taken 1 = yes 2 = no
SERV       Service: 1 = medical 2 = surgical

First 25 observations from the HDUR dataset

ID  DUR AGE SEX  TEMP WBC AB CULT SERV
--  --- --- --- ----- --- -- ---- ----
1    5  30   2  99.0   8  2    2    1
2   10  73   2  98.0   5  2    1    1
3    6  40   2  99.0  12  2    2    2
4   11  47   2  98.2   4  2    2    2
5    5  25   2  98.5  11  2    2    2
6   14  82   1  96.8   6  1    2    2
7   30  60   1  99.5   8  1    1    1
8   11  56   2  98.6   7  2    2    1
9   17  43   2  98.0   7  2    2    1
10    3  50   1  98.0  12  2    1    2
11    9  59   2  97.6   7  2    1    1
12    3   4   1  97.8   3  2    2    2
13    8  22   2  99.5  11  1    2    2
14    8  33   2  98.4  14  1    1    2
15    5  20   2  98.4  11  2    1    2
16    5  32   1  99.0   9  2    2    2
17    7  36   1  99.2   6  1    2    2
18    4  69   1  98.0   6  2    2    2
19    3  47   1  97.0   5  1    2    1
20    7  22   1  98.2   6  2    2    2
21    9  11   1  98.2  10  2    2    2
22   11  19   1  98.6  14  1    2    2
23   11  67   2  97.6   4  2    2    1
24    9  43   2  98.6   5  2    2    2
25    4  41   2  98.0   5  2    2    1

1.8 Clustering of an adverse drug event (TOXIC). An investigation was prompted when the U. S. Food and Drug Administration received a report of an increased frequency of cerebellar toxicity from the University of Wisconsin Hospital and Clinics (Madison) after the hospital had switched from the product manufactured by the innovator company (Upjohn, Co., Kalamzoo, MI) to a generic product produced by Quad Pharmaceuticals, Inc. (Indianapolis, IN). To address this issue, the FDA sent a team of investigators to complete a chart review. Data on patient and treatment characteristics that may place patients at greater risk for toxic reactions was collected.

(A) Classify each variable's as either quantitative (scale), ordinal, or categorical (nominal).
(B) What is the value of the AGE variable for observation 4?
(C) What is the diagnosis of observation 2?

Codebook for TOXIC dataset

Variable   Description
---------- -------------------------------------------------
AGE        Age at time treatment began (years)
SEX        1 = male; 2 = female
MANUF      Manufacturer of the drug: Smith or Jones
DIAG       Underling diagnosis: 1 = leukemia; 2 = lymphoma
STAGE      Stage of disease: 1 = relapse; 2 = remission
TOX        Did cerebellar toxicity occur?: 1 = yes; 2 = no
DOSE       Dose of drug (grams /meters2)
SCR        Serum creatinine (mg/dl)
WEIGHT     Body weight (kg)

The first 5 observations from TOXIC data set

ID  AGE SEX   MANUF   DIAG STAGE  TOX DOSE    SCR   WEIGHT
---  --- ----- ------ ----- -----  --- ------ ------ ------
1   50     1     J    1   1      1   36.0    0.8   66
2   21     1     J    1   2      2   29.0    1.1   68
3   35     1     J    2   2      2   16.2    0.7   97
4   49     2     S    1   1      2   29.0    0.8   83
5   38     1     J    2   2      1   16.2    1.4   97

1.9 Variable types 1. Classify each of the following measurements as either quantitative (scale), ordinal, or categorical (nominal).

(A) Response to treatment coded: 1= no response, 2 = minor improvement 3 = major improvement, 4 = complete recovery.

(B) Style of a house coded: 1 = split-level, 2 = ranch, 3 = two-story.

(C) Annual income (pre-tax dollars)

(D) Body temperature (degrees Celsius)

(E) Area of a parcel of land (acres)

(F) Population density (people per acre)

(G) Political office held coded: 1 = congressman, 2 = senator, 3 = governor  4 = president.

(H) Political party affiliation coded: 1 = Democrat 2 = Republican, 3 = Independent, 4 = Other

1.10 Variable types 2.  Classify each of the following measurements as quantitative (scale), ordinal, or categorical (nominal).

(A) Forced expiratory volume (liters per second)

(B) Leukemia rate in a geographic region (new cases per 100,000 people)

(C) White blood cells per deciliter of whole blood

(D) Presence of Type II diabetes mellitus (yes or no)

(E) Number of new HIV cases in a region in a given month

(F) Body weight (kilograms)

(G) Number of people exposed to second hand smoke

(H) IQ score

(I) High density lipoprotein level (mg/dl)

(J) Course grade (A, B, C, D, F)

(K) Religious affiliation (Protestant, Catholic, Muslim, Jewish, Atheist, Other)

(L) Blood cholesterol level recorded as either high, borderline, normal, or low

(M) Percentage of persons responding "yes" per every 100 surveyed

(N) Course credit (pass/fail)

(O) Ambient temperature (degrees Fahrenheit)

(P) Height (inches)

(Q) Age (years)

(R) Salary (dollars)

(S) Automobile ownership (yes/no)

(T) Type of life insurance policy (term, endowment, straight-life, other, none)

(U) Political party affiliation (Democrat, Republican, Independent, Other)

(V) Student class designation (freshman, sophomore, junior, senior, graduate student)

(W) Product satisfaction (very satisfied, satisfied, no opinion, unsatisfied, very unsatisfied)

(X) Movie review rating (number of stars: 1, 1½, 2, 2½, 3, 3½, 4)

(Y) Case or non-case

(Z) Occupation (Sales / Teaching / Administration / etc.)

[Keys may or may not be linked at the discretion of il professore]