19: Data management

2/6/06

 Review Question [Review questions based on, Lab, Lecture, & Reader.]

(A) Differentiate between measurement error and processing error.
(B) List four types of data entry errors.
(C) List four methods that can be used to mitigate data entry problems.
(D) What is the function of an EpiData QES file?
(E) What (filename) extension is used to identify an EpiData data file?
(F) What type of information is contained in code books. (Be specific.)
(G) Why should you keep backup copies of data files off site?
(H) List elements of data management.
(I) What extension is used to identify permanent SPSS data file?
(J) Describe the nature of flat text (.txt) data files.
(K) What extension is used to identify SPSS syntax (command) files?
(L) What is "controlled data entry"? 
(M) What is the most fool-proof method of creating a variable name in an EpiData QES file? 
(N) What is the maximum number of characters for an EpiData variable name? . . . for an SPSS variable name?
(O) Suppose you need to store data with values that ranged from -9 to 9. What EpiData variable code would you use to create this variable?
(P) Identify two types of data controls created with CHK files.
(Q) When doing double entry and validation, why is it best to use separate data entry people for your two files?
(R) What does it mean when a line in an SPSS syntax file begins with an "*"?

Exercises require you to create EpiData and SPSS files. When creating these files, use filenames, variable names, and coding schemes exactly as specified. For each exercise, create and submit the following files:

(A) *.qes (questionnaire file)
(B) *.chk (checks file)
(C) *.rec (EpiData record file)
(D) *2.rec (duplicate file for double entry & validation procedure)
(E) *.not (code book / notes)
(F) *.sps (SPSS syntax)
(G) *.txt (text data)
(H) *.sav (permanent "saved" SPSS data)

Print a hard copy of your validation report, and keep backup copies of your  files.

19.1 Western Collaborative Group Study (WCGS). Data are a subset of a dataset from the WCGS on cardiovascular risk factors as reported by Selvin (1991, p. 4). A data is available by clicking here. Use the file naming convention wcgs*.* to name files. Save files to your hard drive with backup to a floppy. Use the following variable names and codes for your data:

VarName

Type

Length

Code

Description (Use labels for pre-coded data)

ID

numeric

2

##

identification number
(as specified)

CHOL

numeric

3

###

serum cholesterol
(mg/dl)

BEHAV

test

1

<A>

behavior type
Values: A or B 

19.2 Hospital duration data (HDUR). Data are from a study by Townsend et al (1979) looked at antibiotic utilization in general hosptials in Pennsylvania. A sample of these data are reported in Rosner (1990, p. 36.) and is available by clicking here. Print the data table in the link and create a data files for these data. Use the naming convention lastname_hdur*.* for each file (e.g., gerstman_hdur.qes). The file should create the following variables and labels. 

VarName

Type

Length

Code

Description 

ID

numeric

3

###

identification number
(as specified)

DUR

numeric

2

##

duration of hospitalization 
(days)

AGE

numeric

1

##

age 
(years )

SEX numeric 1

#

sex 
Labels: 1 = male 2 = female
TEMP numeric 5

###.#

maximum body temp 
(degrees F)
WBC numeric 2

##

white blood cell count 
(x100 per dL)
AB numeric 1

#

In-hospital antibiotic use
Value labels: 1 = yes, 2 = no
CULT numeric 1

#

whether a blood culture was taken
Value labels: 1 = yes, 2 = no
SERV numeric 1

#

admitting service
Value labels: 1 = medical 2 = surgical

19.3 Cerebellar toxicity data, sample (TOX-SAMP). Data are the first 20 records from the toxicity study by Jolson et al. (1992). Click here for the data listing. See the HS267 Lab Manual for detailed instructions on how to create, check, validate, document, export, and import this data. Use the following variable names and codes for your data:

Var Name

Type

Length

Code

Units and Codes

ID

numeric

5

<IDNUM>

identification number
(applied automatically)

AGE

numeric

2

##

age
(years)

SEX

numeric

1

#

Sex
Value labels: 1 = male, 2 = female

MANUF

text

1

<A>

Drug manufacturer
Value labels: J = Jones, S = Smith 

DIAG

numeric

1

#

Diagnosis (type of cancer)
Value labels: 1 = leukemia, 2 = lymphoma

STAGE

numeric

1

#

Clinical stage:
Value labels: 1 = relapse, 2 = remission

TOX

logical

1

#

cerebellar toxicity
Value labels: 1 = yes, 2 = no

DOSE

numeric

4

##.#

drug dosage
(gms / M2)

SCR

numeric

3

#.#

serum creatinine 
(mg/dl)

WEIGHT

numeric

3

###

body weight 
(kgs.)

Key to Odd Numbered Problems                 Key to Even Numbered Problems (may not be posted)