10: Cross-Tabulated Counts  3/21/07

Review Questions

  1. What symbol is used to denote the proportion in a population? What symbol is used to denote a sample proportion? 
  2. Use your chi-square table to determine the value of these chi-square landmarks: (a) c²1,.99 (b) c²1,.999 (c) c²2,.95 (d) c²3,.95
  3. The expected frequencies you calculate for chi-square tests assume the null hypothesis is ______ [choices: true or false].
  4. Chi-square tests should be used only when expected values are greater than or equal to ______ [a number].
  5. A chi-square statistic for a 4-by-3 cross-tabulation of frequencies has this many degrees of freedom.
  6. Determine the P value for (a) X2stat = 3.02 with 1 df (b) X2stat = 8.06 with 2 df.
  7. True or false? Chi-square distributions are symmetrical.
  8. State the null hypothesis for testing cross-tabulated counts.
  9. Use the chi-square table to find the area under the χ2 curve with 3 degrees of freedom to the right of (a) 4.64 (b) 6.25 (c) 11.34.
  10. Use the chi-square table to find the approximate area to the right of 5.22 on a χ2 distribution with 3 degrees of freedom. (Optional: Draw the chi-square curve, show 5.22 on the horizontal axis, shade the area to the right of 5.22, and bracket the shaded region with landmarks from the chi-square table.)

Exercises

10.1 Tobacco use in high school students. A survey used saliva samples from high school students to test for the presence of cotinine. (Cotinine is a by-product of tobacco that reliably marks tobacco exposure.) Cross-tabulated results by gender are: 

Cotinine+

Cotinine-

Male

39 279

Female 

26 310

(A) Calculate the prevalence of tobacco use by gender. [First sum cells to derive marginal counts.] 
(B) Test the difference in proportions for significance with a chi-square test. Show all hypothesis testing steps.

10.2 La Méthode Numerique. [Similar to exercise 9.5.4.] Pierre Charles Alexandre Louis is referred to as "the father of medical statistics." His students went on to found public health movements and medical education in England, continental Europe, and the United States. His best known study made careful observations and recordings which cast doubt on bloodletting as a treatment for pneumonia. One of his analysis showed that 18 of 41 (44%) pneumonia patients bled early in the progression of their disease died as a result of their condition. In contrast, 9 of 36 (25%) patients bled late in the course of their illness died. Determine the risk of death in each group and then perform a chi-square test on the data. 

(A) Put these data in a 2-by-2 table.
(B) Test the association for significance.

 
PCA Louis (1787 - 1872)
 

10.3 Do seatbelt laws prevent injury? Data were collected on the level of injury for drivers involved in accidents both before and after enactment of seat belt legislation. Cross-tabulated data are shown below. 

Time period

No injury

Minimal injury

Minor injury

Major/fatal injury

Total

After enactment

1281

64

35

4

1384

Prior to enact.

6596

400

264

30

7290

Total

7877

464

299

34

8674

(A) Calculate the conditional distribution of injuries before and after enactment of the law. What type of association is seen? [One way to see the association is to compare the % of injuries that resulted in no injury after enactment to that of the % prior to enactment.]
(B) Test the association for statistical significance.
(C) Advanced/optional: Describe the trend in injury rates and test it for significance.

10.4 Cytomegalovirus and coronary restenosis.  [Similar to 9.5.5.] Each year cardiologists open clogged coronary arteries only to have many of these same arteries close again (restenose) following surgery. A study sponsored by the NIH was conducted to help determine whether infection with cytomegalovirus  (CMV) was predictive of restenosis (Zhou et al., 1996). Forty-nine (49) of the subjects showed serological evidence of CMV infection, while 26 showed no such evidence. Cross-tabulated data from the study are: 

 

Restenosis

 

 

Yes

No

Total

CMV+

21 28

49

CMV-

2 24

26

Total

23 52

75

(A) Calculate the incidence proportion of restenosis in each group. How do they compare?
(B) Test the difference in proportions for significance. Show all hypothesis testing steps. Do data support the theory that CMV plays a role in restenosis? 
(C) Compare the incidence in the form of a risk ratio. In relative terms, what effect did being CMV+ have on the risk of restenosis? 

10.5 Vitamin C and the common cold. A double blind preventive trial was conducted in 103 school-age children. Thirty-six (36) of 57 (63%) of the children receiving vitamin C caught colds during a school year. In contrast, 35 of 46 (76%) of children receiving a placebo caught colds. Are these proportions significantly different? 

10.6 Seatbelt use. A survey on seat belt use in  25- and 29-years males found that 24 of 60 respondents (40%) without a college degree always used seat belts when driving. In contrast, 30 of 40 respondents (75%) with college degrees always always wore seat belts. Is this difference statistically significant? Show all hypothesis testing steps. 

10.7 Frequency of problems at community mental health centers. Three community mental health centers classified patients into three groups according to the primary problem for which they were seen. Data are shown below and are stored online in ment-hlt.sav. (Data are fictitious but realistic; a similar problem appears in Howell, 1995, p. 375.)

(A) Compare the distribution of problems within centers. Which problem is most common in Center 1? Which is most common in Center 2? ... in Center 3? 
(B) Conduct a chi-square test of association. Report all hypothesis testing steps. Is there a significant difference in problem type?


10.8 Treatment of leprosy. In 1954 W. G. Cochran wrote an important article on the use chi-square test called Some methods for strengthening the common chi-square tests. On page 435 of the article he used as an illustration results from an experiment on the drug treatment of leprosy. Data are shown in the table below. The example considers treatment results in the 196 leprosy patients. The column variable represents degree of skin infiltration as a measure of skin damage. The row variable represents the change in overall clinical condition during 48 weeks of treatment and is graded as “marked,” :moderate,” “slight,” and so on. Data are:  

Skin infiltration

Marked improve.

Moderate improve.

Slight improve.

Stationary

Worse

Total

High

7

15

16

13

1

52

Low

11

27

42

53

11

144

Total

18

42

58

66

12

196

(A) Calculate the conditional distributions comparing improvement levels based on initial skin infiltration . Is there an clear association between skin infiltration level tendency of improvement?
(B) Test H0: no association by computing a X2 test statistic, its degrees of freedom, and the P-value.
(C) What can be inferred from this analysis?
(D) Advanced/optional: Test the data for trend. What can be inferred from this analysis?

 

10.9 Drove when drinking alcohol. The Youth Risk Behavior Surveillance (YRBS) system monitors health behaviors in youth and young adults in the United States. Results from the 2005 survey indicated that, during the 30 days preceding the survey, 9.9% had driven a car or other vehicle when they had been drinking alcohol (Eaton et al., 2006). Counts for selected racial/ethnic groups are shown in the 3-by-2 cross-tabulation below. 

(A) Calculate the proportion of each group that exhibited the behavior of driving when drinking alcohol. Include a margin of error for each proportion. Is there an association between race and this trait?  [Note: From unit 9, the large sample formula for the standard error of a proportion SEp-hat = sqrt( p-hat * q-hat / n). The margin of error for 95% confidence is about twice the standard error.]
(B) Test the association for significance. 

 Race

Drink+

Drink -

Total

White*

243

1910

2154

Black*

25

483

508

Hispanic

55

470

526

Total

323

2864

3187

* White non-Hispanic and Black non-Hispanic

[ Note: The survey used a complex sampling design. The counts listed are based on the information in Tables 1 & 4 of Eaton, et al., 2006 but have been simplified to accommodate the design effect of the complex sample.]  

10.10 Efficacy of echinacea in treating upper respiratory infections (severity of symptoms). A randomized, double-blind, placebo-controlled study evaluated the efficacy and safety of the herbal remedy Echinacea purpurea in treating upper respiratory tract infections in 2- to 11-year-old children. Each time a child had an upper respiratory tract infection, treatment with either echinacea or a placebo was given for the duration of the illness. Parental assessments of the severity of the illness and treatment status (blinded) are cross-tabulated below (Taylor et al., 2003; Moore & McCabe, 2006, p. 628):

 

 

Parental assessment

 

 

Mild

Moderate

Severe

Total

Echinacea

153

128

48

329

Placebo

170

157

40

367

Total

323

285

88

696

(A) Determine the conditional distributions of outcomes in the echinacea and placebo groups. Discuss your results. Is there an association?
(B) Test the association for significance.

10.11 Anger and heart disease (hard outcome, normotensives). A study looked at whether people who angered easily were more likely to developing coronary heart disease (CHD) than those who were less easily angered. The Spielberger Trait Anger Scale test was administered to 8474 individuals and scores were classified into three categories (low, moderate, and high anger). The cohort was followed for up to 72 months (median follow-up period 53 months) for  the "hard" coronary events of acute myocardial infarctions and coronary fatalities (Williams et al., 2000). Data for normotensives individuals are: 

Anger  

CHD+

CHD-

Total

Low

31 3079 3110

Moderate

63 4668 4731

High

18 615 633

Total

112 8362 8474

 

(A) Calculate incidence proportions of CHD in each of the groups. Describe the observed association.
(B) Test the association between anger level and CHD for significance. Report all hypothesis testing steps. 
(C) In one or two concise sentences, summarize the results.

[Comment: Practical problems due to confounding by extraneous factors pose a threat to this type of research. The researchers noted that individuals with high anger-trait scores were slightly younger, more likely to be men, more likely to have less than a high school education, m ore likely to be smokers and drinkers, had slightly lower HDL cholesterol level, and had higher waist-to-hip ratios compared with participants who were moderate or low scorers. A Cox regression model was used to adjust for age, race/ethnicity, level of education, sex, waist-to-hip ratio, drinking status, smoking status, plasma LDL and HDL, and diabetes in the original study (Williams et al., 2000). However, one is never certain that such adjustment procedures are complete.]

10.12 Anger and heart disease (hard outcome, hypertensives). Here is the data from the study described in Exercise 10.11 for the hard cardiac outcomes among hypertensives.  

Anger  

CHD+

CHD-

Total

Low

60 1651 1711

Moderate

71 2363 2434

High

13 354 367

Total

144 4368 4512

(A) Calculate incidence proportions of CHD in each of the groups. Describe the observed association.
(B) Test the association between anger level and CHD for significance. Report all hypothesis testing steps. 
(C) In one or two concise sentences, summarize the results.

Key to Odd Numbered Problems                              Key to Even Numbered Problems (may not be posted)