19: Data management (Key)

Review Questions

(A) Measurement error occurs when data are inherently inaccurate or mismeasured. This is more or less synonym with information bias -- a systematic error in the data. With processing error, errors comes after data have been collected, such as might occur in writing down data, entering data into the computer, or damage to data files after data have been entered).
(B) Types of data entry errors: transpositions, copying errors, coding errors, rounding errors, consistency errors, range errors.
(C) Ways to minimize data entry errors: (1) manual checking (i.e., compare data entered against original sources) , (2) range and consistency checking (e.g., via check codes), (3) screen data for outliers during analysis, (4) double entry and validation.
(D) QES files are used to define variables and set up your data entry screen.
(E) REC files
(F) Codebooks include: (1) Historical information about the data (e.g., date the initial data file was create and or last modified), (2) variable names, (3) variable structure (e.g., length and types), (4) codes used for data (5) labels and units of measure.
(G) Because natural and other disasters (e.g., fires, floods, theft) may occur on site, destroying all files.
(H) Planning data needs, data collection, date entry, data validation and checking, file backup, data documentation
(I) .sav
(J) Flat files contain alphanumeric characters that correspond to data value. Columns corresponding to variables, variable names, variable labels, and other data identifiers are not contained in the flat file. 
(K) .sps
(L) Controlled data entry is data entry that will accept only data which meets certain criteria.
(M) Enclose the variable name in {curly brackets}.
(N) 10 . . . 8
(O) ##
(P) Value labels; range restrictions; (also "jumps" and "must enters")
(Q) Because entry people tend to make similar entry problems (e.g., the same tpye of tpyos) upon repetition.
(R)  To identify them as comments.

Example of a grading key for a data management exercise

[1] *.qes (questionnaire file)
[2] *.chk (controls and checks for data file)
[3] *.rec (EpiData record file)
[4] *2.rec (duplicate file for double entry & validation)
[5] *.not (code book and other data notes)
[6] *.sps (SPSS syntax file) and *.txt (flat data file)
[7] *.sav (SPSS permanent data file)
[8 - 10] data accuracy