Epi Info Version 6 Basics

Operating Environment
� Version 6 and DOS � Downloading Epi Info � Running Epi Info
Data Entry
� Questionnaire Files � Record Files
Data Checking
� Types of Errors � Double Entry and Validation
Data Conversion, Documentation, and Storage
� Exchanging Data with Other Programs (IMPORT and EXPORT) � Data Documentation Files � Data Backup � Data Security
Introduction to ANALYSIS
� Background � Opening Files � Routing Output � Viewing and Printing File Contents � Variable Names � Batch Processing
Review Questions
Exercises
References

Operating Environment

Version 6 and DOS

Version 6 of Epi Info� (Epi6) is an MS-DOS application that also runs under Windows. It was developed in the 1990s by the Centers for Disease Control and the World Health Organization. To run Epi6, you need to know a little bit about the computer operating system under which Epi6 runs: DOS.

DOS, the operating system under which Epi6 functions, controls the input and output ("I/O") of information to and from the computer's central processor unit (CPU). I've found that it is helpful to understand the "plumbing" that goes on behind the surface, at least from a very basic level. Just think of what happens when you type at the keyboard. DOS sends (outputs) this information to the monitor ("console") for display. If you want to save these keystokes for later use, they must be sent to a disk, from where they can later be restored. Alternatively, if hard copy is needed, they must be outputted (en mass) to the printer. This very basic notion of input and output is central to data processing logic. We might even divide devices into the primarily function. The keyboard, mouse, and CD-ROM are primarily input devices. The monitor and printer are output devices. Disks, network cards, modems, etc., send things to and from the CPU: they are input and output devices. You need to keep things very simple when working with computers.

One of the most important functions of an operating system is to help manage files. DOS files have two-part names: the filename proper, composed of eight or fewer characters, and a three letter extension. A dot separates the filename and extension (e.g., FILENAME.EXT). DOS files are stored in directories (analgous to folders) which are structured according to a hierarchal "tree" or "path." The hierarchical path on each disk drive starts with a bottom "root" directory from which directories branch off. Backslashes (\) separate directories along the way. For example, \DIR2\SUBDIR1\FILENAME.EXT shows the path \DIR2\SUBDIR1\ ending in the file FILNAME.EXT. Separate paths reside on local disks (on your computer) and remote servers (on the network).

Disk drives come in removable (e.g., floppy disk) and fixed ("hard") forms. Drives are labeled with letters followed by a colon. For example, the floppy disk drive is drive A:. The primary hard disk drive is (usually) C:. Networked nodes may have drive letters or descriptive names (e.g., Department Server 1).

Software looks for files in default drives and paths ("defaults") unless otherwise specified. Epi Info usually defaults to C:\EPI6 unless this has been overridden by the system manager. To override the default temporarily, the user specifies the full drive path when issuing a command. For example, to read a file off the floppy drive, the user specifies READ A:\MYFILE.

File management is an important part of computing and analysis. Basic file management functions include the ability to copy, rename, move, and backup files. DOS has native commands for these functions (e.g., COPY, DEL, RENAME), Windows 3.x uses a separate File Manager, and Windows 9x/NT uses the Windows Explorer. You must learn how to use these file managers if you are to be an effective computer user.

Downloading Epi Info

Epi Info can be downloaded from www.cdc.gov/epo/epi/epiinfo.htm. Installation instructions are posted on the Web site and in Chapter 4 of the Epi Info manual (Dean et al., 1994).

Running Epi Info

The method to start Epi Info will vary from installation to installation. On a DOS system, the program is usually begun by typing "EPI6" (without the quotes) from the DOS prompt. In Windows, some system managers will install an EPI6 icon on the desktop or Start Bar. (The lab at San Jose State has the Epi Info icon installed on the Start Bar as item 6).

Once started, separate program modules are accessed through Epi Info's main Program Menu. Key programs on this menu include:

EPED: the EPidemiologic text EDitor
ENTER: for data entry
ANALYSIS: for statistical analysis and reporting
IMPORT: for importing data files from other database programs
EXPORT: for exporting data to other database and statistical programs
STATCALC: a statistical calculator
EPITABLE: an alternative statistical calculator

Each of these programs has specific functions and works on specific types of files.

Data Entry

Questionnaire (QES) Files

Before creating a data file in Epi Info, you must first create a questionnaire (QES) file. This QES files will serve as the template for your data entry screen and data base. QES files can be created with any ASCII text editor, such as Epi Info's text editor (EPED) or Windows NOTEPAD. However, they must always end with the file extension .QES.

Instructions on using Epi Info editor EPED are found in Chapters 6, 7, and 33 of the Epi Info Manual (Dean, et al., 1994). Windows NOTEPAD is pretty much "point and click," but may try to append a TXT extension to the end of the filename, especially when used under WinNT. I therefore recommend using EPED for creating QES files in our lab.

Regardless of what editor program you use to create QES files, these files should contain:

(1) Survey questions and instructions
(2) Variable names
(3) Variable indicator codes

Survey questions and instructions should be brief, clear, non-leading, and unambiguous. Variable names must start with a letter and can contain no more than 10 characters. Your variable names should be descriptive and, to avoid confusion, should be enclosed in {curley brackets}.

Variable indicator codes define the type and length of the variable being created. The most common variable indicator codes are:

Indicator Code Data Type

# Numeric integer

#.# Numeric real

_ Alphanumeric

<A> Alphanumeric, upper case only

<Y> "Yes / no" (binary

<mm/dd/yyyy> Date, American

<dd/mm/yyyy> Date, European

Indicator codes must be typed exactly as shown, for the type of code will determine the variable's type. With numeric and alphanumeric variables, the number of characters in the code will determine the variable's length. For example, the code # will create a numeric variable one byte in length (i.e., able to store the digits 0 though 9). The code ## will create a two-byte numeric variable, capable of storing digits -9 to 99 (inclusive).

Now for some examples. To create an alphanumeric viable called LASTNAME, you could type:

What is your {lastname}? __________________________

This variable will store an alphanumeric string that is up to 26 bytes in length.

The question:

What is your {age}? ###

creates a numeric variable called AGE that is 3 bytes in length.

The item:

{Date}: <mm/dd/yyyy>

creates a Y2K compliant American date field called DATE.

The questionnaire should be carefully designed and tested before it is put into use.

Record (REC) Files

QES files are used to create REC files, which in turn, are used to store data. To create an empty REC file,

Start the ENTER program from the Programs menu
Type the name of the REC file you want to create into the first field on the screen
Press the Tab key to get to the next field
Select option 2 to create a new data file
Enter the name of the QES file which will form the basis of the REC file
Press the Tab key
Type "Y"

A REC file that looks like the QES file used to create it will now appear on screen. Data are entered into the blank fields on this screen by the user. After entering each record, the user types Y to store the data permanently.

Function-keys are used to navigate existing REC files. A function-key menu appears on the bottom the data entry screen. The <Ctrl-F> key-combination is used to find a particular record, <F6> is used to delete a record,<F7> travels up one record (think "7 up"), <F8> travels down one record, and <F10> quits the ENTER program. Chapter 8 in the Epi Info manual (Dean et al., 1994) contains additional instructions on how to use ENTER.

Data Checking

Types of Errors

Data errors may be classified as either response errors or processing errors (Bennett et al., 1996). Response errors refer to differences between the "true" state of affairs and what appears on the questionnaire. These can be viewed as problems in measurement, and as such are discussed elsewhere. Processing errors are errors that occur during data handling.

Bennett et al. (1996) suggest the following classification of processing errors:

Transpositions (e.g., 19 becomes 91 during data entry)
Copying errors (e.g., 0 (zero) becomes O during data entry)
Coding errors (e.g., a racial group gets improperly coded because of changes in the coding scheme)
Routing errors (e.g., the interviewer asks the wrong question or asks questions in the wrong order)
Consistency errors (contradictory responses are collected, such as the reporting of sensitivity to penicillin after the respondent has entered "N" in response to the question "Do you have any allergies?")
Range errors (responses outside of the range of plausible answers, such as a reported age of 900 instead of 90)

The most effective way to deal with these errors is to identify the stage at which they occur and reduce them at their source. This may involve:

Manual checks during data collection (e.g., checks for completeness, legible handwriting, coding errors)
Range and consistency checks during data entry (e.g., using Epi Info's CHECK program: see Epi Info Manual chapter 10)
Checks for "outliers" during analysis (e.g., "male hysterectomies")
Double entry and validation procedures, as described below.

Double Entry and Validation

With double entry and validation, two separate (but presumably identical) data files are created by independent operators. Epi Info's VALIDATE program is then used to check for inconsistencies in these files.

This steps of double entry and validation are:

Two empty REC files having identical structures but different filenames (e.g., SAMPLE.REC and SAMPLE2.REC) are created.
Data are independently entered into these two files.
Epi Info's VALIDATE program is run to perform a side-by-side comparison of the files.
Discrepancies are reported by VALIDATE.
The user refers back to the original data to determine the source of the inconsistency.
Corrections are made in the master data file, as necessary.

Use of VALIDATE is documented in Chapter 19 of the Epi Info manual.

Data Conversion, Documentation, and Storage

Exchanging Data with Other Programs (IMPORT and EXPORT)

Researchers often need to share data with colleagues using database and statistical programs other than Epi Info. Since most of these data programs use proprietary file formats, a minor obstacle is encountered when trying to exchange data among different programs. Fortunately, Epi Info has data translators (EXPORT and IMPORT) to assist in getting around this potential obstacle. Although Epi Info manual Chapters 16 and 17 (Dean et al., 1994) explain these programs, a bit of background may prove helpful.

The trick in exchanging data is to find a file format that is common to both programs. dBASE (DBF) file-formats are often good choices for this purpose, since these files contain variable defining and other information. However, when this option is not available, we might resort to a more generic form of data: the fixed-length record format.

Fixed-length field record files are also called flat files and card files. To create a fixed-length field file from a REC file, open the EXPORT program, select a file, and then select "FIXED" as the output format. Clicking the OK button will create a fixed-length field file with the CAR extension to denote its type. (This stands for "card file," a naming convention that dates back to the time when data were stored on punched paper-cards.) An example of a fixed length filed file is:

3521FYN01/09/89Y
3742MYY10/21/89Y
43 5MNY01/12/90Y
14311FYN02/17/89Y
32130MYY12/28/89Y
32950MYY12/29/89N
33728MNN08/19/89Y
49227.NN08/31/89N
49424MYY08/19/89Y
54652.YY10/13/89Y

Notice that without additional documentation, the user cannot decipher variable names, variable types, variable lengths, units of measure, coding practices, and so on. Therefore, card files must be accompanied by data documentation information.

Data Documentation Files

The goal of a data documentation file is to provide enough information so that persons unfamiliar with the data file can interpret the file's content. Data documentation should therefore include details about the data file's:

Structure (e.g., name of the data file, number of records in the data, length of each record, etc.)
Variables (e.g., variable names, variable values, encoding methods)
History (e.g., creation data, modification dates and methods)
Storage information (e.g., media type, media location)
Any additional information that would simplify future use of the data file

I find it best to document data files with text files included in same directory as the data file itself. This data documentation file may be created with any text editor (e.g., Windows NOTEPAD) and should be identified with the same name as the data file but with the DD extension (e.g., FILE.DD). A DD file for the above flat file listed above

Dataset: SAMPLE.CAR stored on floppy
Size: 10 records of 17 bytes each

VarName    Type    Len (Columns) Description
---------- ------- ------------- -------------------------------
ID         Integer 3 (1-3)      Identification number
AGE        Integer 2 (4-5)      Age (years)
SEX        ALPHA    1 (6)        Sex: M, F, .
HIV        Yes/no   1 (7)        HIV positive: Y, N, .
KAPOSISARC Yes/no   1 (8)        Kaposi Sarcoma status: Y, N, .
REPORTDATE Date     8 (9-16)     Report date: dd/mm/yy (Y2K prob)
OPPORTUNIS Yes/no   1 (17)       Opportunistic infection: Y, N, .

Notice how this data documentation table allows the user to decipher the flat file. For example, we now see that the first three characters in each line (record) refer to the subject's identification number. Accordingly, the first record is associated with identification number 37. The next two characters store the subject's age. Thereby, the first subject is 21 years of age. (And so on.)

Data documentation files are especially important when working with fixed-length field files, but are also important when working with data of all forms.

Data Backup

To paraphrase a well-known computing saying,

There are two kinds of computer users. Those that have lost a major chunk of data, and those who are going to lose a major chunk of data.

Disasters such as fire, theft, and earthquakes do happen, it is important to routinely perform data backup procedures. All elements of a project -- including data files, data documentation files, software settings, computer programs, word processing documents, and so on -- should be backed-up, with copies of backups kept on-site and off. Backup procedures may entail manual or automated copying of files to removable media (e.g., floppy disks, Zip disks, tape) or may involve backing up of files over a network.

Backup procedures should be thoroughly tested to ensure that archived files remain uncorrupted and can be easily restored. Procedures should be written up so that so that personnel unfamiliar with backup and restore methods could follow the procedure if the primary data manager is unavailable. The entire process should be kept simple for success.

Data Security

Epidemiologists and other health researchers need to be aware of the ethics of working with the private nature of research files. This is especially important when data contain personal identifiers and confidential medical information. It is therefore each researcher's duty to make him or herself aware of local, national, and international laws (which may vary) and follow ethical guidelines governing use of population-based data. Researchers must make it their responsibility to protect study subjects' privacy.

Many potential legal and ethical problems can be allayed by using anonymous data files (i.e., data containing information about individuals but without personal identifiers). It is not always clear, however, when data are fully anonymous. For example, in studying rare events in an identified population, it is conceivable that an unscrupulous user could use supplementary information to re-identify individuals. Although the objective of protecting individual identity in such instances is clear, it is not always clear how far the epidemiologist's responsibility extends in protecting identities under such circumstances.

Introduction to `ANALYSIS`

Background

Epi Info's ANALYSIS program is used to manage, print, summarize, and analyze data. After starting ANALYSIS from the Programs menu, your computer screen will be divided into an Output Window (above) and Command Window (below). A Status Line showing the name of the active data set and free memory appears above the Output Window, and a Function Key Menu is below the Command Window.

ANALYSIS, itself, is command driven. There are commands for file processing (e.g., READ), output control (e.g., NEWPAGE), variable manipulation (e.g., SELECT), and statistical analysis (e.g., MEANS). A list of commands is assessed by pressing <F2>. Commands are either typed at the command prompt (EPI6>) or sent to the command line by highlighting them from the <F2> menu and pressing Enter.

Commands can be processed one at a time ("interactive processing") or in a pre-programed series ("batch mode"). Interactive command processing has the advantage of immediate feedback; batch processing is useful when complex logic is involved and when tasks are used repeatedly. For now, let us focus on interactive processing.

Opening Files

Before performing operations, the user must open the data file with a READ command, used as follows:

EPI6> READ <x:\path\filename.ext>

where <x:\path\filename.ext> represents the drive, path, and filename of the data set you wish to open. For example, to open SAMPLE.REC in the root directory of drive A, issue the command:

EPI6> READ A:\SAMPLE

Comment: When working with dBASE files, specify the .DBF extension must be specified. (e.g., READ SAMPLE.DBF).

Data set are active only during the current session. Quitting ANALYSIS or reissuing a READ command starts the session anew.

Routing Output

By default, ANALYSIS sends output to the screen. To send output to the printer, issue the command:

EPI6> ROUTE PRINTER

Comment: The <F5> key acts as an on/off switch for routing to the printer.

Output can also be routed to a file by issuing the command:

EPI6> ROUTE <a:\path\filename.ext>

For example, to route output to a file named SAMPLE.OUT, issue the command:

EPI6> ROUTE SAMPLE.OUT

Viewing and Printing File Contents

Data are viewed with the LIST, BROWSE, and UPDATE command. LIST provides a line listing of the active data, BROWSE allows the user to scroll through the data on the screen, and UPDATE provides a spreadsheet-like environment for browsing and editing.

To produce printed output ("hard-copy"), output is LISTed after ROUTEing output to the printer by using the following commands:

EPI6> ROUTE PRINTER
EPI6> LIST

By default, LIST prints as many variables that will fit across the screen's width. To list a subset of variables, follow the LIST command with the names of variables you want to print. For example, to list the variables ID, AGE, and HIV statuses, issue the commands:

EPI6> READ SAMPLE
EPI6> LIST ID AGE HIV

Here's an example of one such listing:

REC ID AGE HIV
--- --- --- ---
1   35 21 Y
2   37 42 Y
3   43   5 N
4 143 11 Y
5 321 30 Y
6 329 50 Y
7 337 28 N
8 492 27 N
9 494 24 Y
10 546 52 Y

Comment: Notice that the Epi Info RECord number is automatically attached to each line. This option can be turned off by proceeding the LIST command with the with the SET LISTREC = OFF command. (The SET command controls many Epi Info settings.)

Variable Names

Another useful command is the VARIABLES command. This command provides variable names and other useful information about variables that are in the active data set. An example of output from VARIABLES is:

Dataset: C:\DATA\ANALYSIS\SAMPLE.REC
Size: 10 records of 17 bytes each.
Free memory: 355248 bytes

Name       Type    Len   Name       Type    Len   Name       Type    Len
---------- ------- ---   ---------- ------- ---   ---------- ------- ---
ID         Integer   3   AGE        Integer   2   SEX        ALPHA     1
HIV        Yes/no    1   KAPOSISARC Yes/no    1   REPORTDATE Date      8
OPPORTUNIS Yes/no    1

Variable names can also be ascertained by pressing <F3> while in analysis. Variable can be selected from this list by highlighting them and pressing Enter.

Batch Processing

Until this point we have used ANALYSIS one command at a time. We may also submit commands in groups by creating and then submitting user-created program (.PGM) files.

PGM files are created with a text editor. For example, the user may type the program:

READ A:\SAMPLE.REC
ROUTE PRINTER
LIST

This program will read SAMPLE.REC, ROUTE future output to the printer, and then send a line listing of the data to the printer. Let us assume that this file is saved under the filename SAMPLE.PGM.

The program is run in ANALYSIS by issuing the command:

EPI6> RUN SAMPLE.PGM

Output is then created as if the program steps were executed one at a time.

Although we have not covered many of the powerful data manipulation and analysis features of ANALYSIS, this should give you enough background to get started.

Review Questions

What program is used to create QES files?
What functions do QES files serve?
What program is used to create REC files?
What function do REC files serve?
Describe DOS file naming conventions.
Describe Epi Info variable naming conventions.
How does Epi Info assign variable names?
What type of Epi Info variables are used to store numeric data?
What type of Epi Info variables can be used to store categorical (qualitative) data?
What is a "card file?"
What Epi Info program is used to create program (.PGM) files?
What Epi Info programs can open (i.e., work with) REC files?

ANSWERS TO REVIEW QUESTIONS

1. EPED or any other ASCII text editor.
2. QES files structure data entry screens and REC files.
3. ENTER
4. They store data.
5. DOS files use 8 character filenames and three letter extensions. The extension identifies the file's type.
6. Epi Info variable names must begin with a letter and may be no more than 10 characters in length.
7. Unless the variable name is identified within {curly brackets}, Epi Info will take the 10 characters preceding a variable indicator as the variable's name.
8. Numeric integer (#) and numeric real (#.#).
9. Text (_), upper case text (<A>), or numeric (#) if data are numerically encoded (e.g., 1 = male, 2 = female).
10. Card files are fixed-length field files that contain numerical and text information only, with no variable defining information.
11. EPED or any other ASCII text editor.
12. ENTER, ANALYSIS, and EXPORT

Exercises

(1) TOXICITY: Toxicity Associated With Chemotherapy

A chemotherapeutic agent used to treat adults with leukemia and lymphoma is manufactured by two companies: Smith, Inc. and Jones, Inc. With the best of care, this drug is associated with a serious form of cerebellar toxicity that occurs in approximately 8% of patients. Recently, a tertiary care hospital has switched its source of the drug from the Smith company to the Jones company. Several clinicians now report a possible increase in the occurrence of cerebellar toxicity. To address this concern, you complete a chart review of patients with data collected on the following variables: ID (identification number), AGE (years), SEX (1 = M, 2 = F), MANUFacturer (S = Smith, J = Jones), DIAGnosis (1 = leukemia, 2 = lymphoma), STAGE of disease (1 = relapse, 2 = remission), TOXICity (1 = yes, 2 = no), DOSE (gms/M²), serum creatinine level (SCR, mg/dl) and WEIGHT (kilograms). Data are show in the table below this exercise.

(A) Create your primary data file. Create an EpiInfo or EpiData file with these data. Assign the variable names exactly as indicted. Name the primary data file TOXICITY.REC. (Suggestion: Work in pairs: have one partner read the data from the printout while the other partner enters the data.)

(B) Validation file. Create a duplicate file. Name this secondary file TOXICIT2.REC. Switch roles: have partner 2 enter the data this time around. Then run the Validate program, comparing the two files side-by-side. Note any errors; make corrections in the primary file.

(C) Documentation. Create a codebook for your data file. (EpiData has two different automated codebook procedures.) Print a hard copy of your codebook for future reference.

(D) Export and Import. Export the data to either a flat card file, dBASE file, or SPSS file, depending on the wishes of your instructor. Then import the data into your statistical analysis program.

(E) Print. Print a hard copy of your data table.

(F) Backup. There are two types of computer users. Those that have lost data and those that are going to lose data. Therefore, keep a back-up copy of all your data and codebook files, and keep these files in a separate "off site" location.

REC AGE SEX   MANUF   DIAG STAGE TOX DOSE    SCR   WEIGHT
--- --- ----- ------- ----- ----- --- ------ ------ ------
1   50     1      J    1   1      1   36.0    0.8   66
2   21     1      J    1   2      2   29.0    1.1   68
3   35     1      J    2   2      2   16.2    0.7   97
4   49     2      S    1   1      2   29.0    0.8   83
5   38     1      J    2   2      1   16.2    1.4   97
6   42     1      S    2   2      2   18.0    1.0   82
7   17     1      J    1   2      2   17.4    1.0   64
8   20     1      S    2   2      2   17.4    1.0   73
9   49     2      J    1   1      2   37.2    0.7 103
10   41     2      J    1   2      2   18.6    0.9   58
11   20     1      S    2   2      2   18.0    1.1 113
12   55     1      S    1   1      2   36.0    0.8   87
13   44     2      J    1   1      1   22.4    1.2   59
14   23     1      S    2   2      2   39.6    0.8   83
15   64     2      S    1   1      2   30.0    0.9   69
16   65     1      S    1   1      1   23.2    1.7 106
17   23     2      S    1   2      2   16.8    0.9   66
18   44     1      S    1   2      2   17.4    1.0   84
19   29     2      S    2   1      2   18.0    0.7   56
20   32     1      S    1   2      2   18.0    1.0   84
21   18     1      S    2   2      2   17.4    0.9   70
22   22     1      S    1   1      1   26.1    1.7   69
23   43     2      J    2   2      2   18.0    0.8   63
24   39     2      S    1   2      2   18.0    0.9   55
25   38     2      J    1   1      1   16.0    1.0 112
26   43     2      J    1   1      1   33.0    1.5   63
27   42     2      J    1   2      2   18.0    0.7   57
28   66     1      J    1   1      1   30.0    1.3   88
29   61     2      S    2   1      2   34.8    1.2   67
30   29     1      S    1   1      2   36.0    1.2 115
31   47     2      J    1   2      1   18.6    0.8   66
32   41     1      S    1   2      2   14.4    0.9 117
33   31     1      S    2   2      2   13.8    0.8 144
34   46     1      S    1   1      2   34.8    0.9 123
35   50     2      J    1   2      2   24.0    0.8   68
36   18     1      S    2   2      2   18.0    0.9   66
37   33     1      S    1   1      2   36.0    1.1   80
38   50     1      S    1   1      2   37.2    0.9 107
39   24     2      S    1   2      2   18.0    0.6   50
40   25     1      J    2   2      2   16.8    0.8   93
41   67     1      J    1   1      1   31.9    1.0   93
42   65     2      S    1   1      2   24.0    0.7   70
43   29     1      J    1   2      2   18.0    0.9   79
44   59     2      J    1   1      1   27.9    2.0   60
45   46     2      J    1   2      2   18.0    0.5   56
46   64     1      S    2   1      2    8.0    2.0   60
47   62     2      J    1   1      1   18.0    1.3   85
48   21     1      J    2   2      2   17.4    0.7   63
49   47     2      S    1   1      2   36.0    1.0   80
50   64     2      J    1   1      1   30.0    1.2   68
51   49     2      S    1   1      2   29.0    0.8   45
52   31     1      J    2   2      2   18.0    1.4   71
53   24     2      S    2   2      2   24.0    0.9   66
54   28     1      S    1   2      2   18.0    1.1   73
55   42     2      S    1   2      2   18.0    1.0 102
56   52     1      S    1   1      2   18.0    1.1   82
57   52     1      S    1   1      1   32.4    1.2   91
58   47     1      S    1   2      2   17.4    1.0   88
59   35     2      J    1   2      2   18.0    0.7   55

Source: Jolson, H. M., Bosco, L., Bufton, M. G., Gerstman, B. B., Rinsler, S. S., and Williams, E. (1992). Cerebellar toxicity associated with high dose cytarabine therapy. Journal of the National Cancer Institute, 84, 500 - 505.

(2) HDUR: Duration of Hospital Stays

Data source: Townsend et al., 1979, as reported in Rosner, 1990, p. 36.

Data represent a sample from a larger hospital study on antibiotic usage and other factors as collected on hospital summaries. Information on the following variables are available: DURation of hospitalization (days), AGE (years), SEX (M = male, F = female), body TEMPerature (degrees Fahrenheit), while blood cell count (WBC: 100 / dl), in-hospital antibiotic use (AB: use <Y> variable type), whether a blood CULTure was taken (1 = yes, 2 = no), and admitting SERVice (1 = medical, 2 = surgical). Data are:

DUR AGE SEX TEMP WBC AB CULT SERV
--- --- --- ----- --- -- ---- ----
   5 30   F 99.0   8 N    2    1
10 73   F 98.0   5 N    1    1
   6 40   F 99.0 12 N    2    2
11 47   F 98.2   4 N    2    2
   5 25   F 98.5 11 N    2    2
14 82   M 96.8   6 Y    2    2
30 60   M 99.5   8 Y    1    1
11 56   F 98.6   7 N    2    1
17 43   F 98.0   7 N    2    1
   3 50   M 98.0 12 N    1    2
   9 59   F 97.6   7 N    1    1
   3   4   M 97.8   3 N    2    2
   8 22   F 99.5 11 Y    2    2
   8 33   F 98.4 14 Y    1    2
   5 20   F 98.4 11 N    1    2
   5 32   M 99.0   9 N    2    2
   7 36   M 99.2   6 Y    2    2
   4 69   M 98.0   6 N    2    2
   3 47   M 97.0   5 Y    2    1
   7 22   M 98.2   6 N    2    2
   9 11   M 98.2 10 N    2    2
11 19   M 98.6 14 Y    2    2
11 67   F 97.6   4 N    2    1
   9 43   F 98.6   5 N    2    2
   4 41   F 98.0   5 N    2    1

(A) Create an Epi Info file containing these data. Make certain variables are properly named and formatted. Name the file HDUR.REC, and turn it in on a floppy disk. (Suggestion: Have a partner read the data from the data sheet while you enter data at the keyboard. This will cut down on data entry errors.)

(B) Use a double entry and validation technique to validate your data file. Name this secondary file HDUR2.REC. (Suggestion: Switch partner roles, so data are entered independently. This will cut down on data processing errors.) Make corrections in the primary file (HDUR.REC), as necessary.

(C) Export the data to a fixed-length field file (HDUR.CAR). Accompany this card file with a data documentation code-book file (HDUR.DD).

(D) Export the data set to a dBASE III file (HDUR.DBF). Include this file on your data disk.

(E) In the ANALYSIS program, print a hard-copy (LIST) the data.

(F) Make backup copies of all files and hand them in on a floppy disk, one disk per team.

(3) WCGS: Western Collaborative Group Study

Data source: Selvin, 1991, p. 41.

Data from a study on type A behavior and cholesterol are as follows:

ID        CHOL       BEHAV
--        -----      ------
1          233          A
2          291          A
3          312          A
4          250          A
5          246          A
6          197          A
7          268          A
8          224          A
9          239          A
10         239          A
11         254          A
12         276          A
13         234          A
14         181          A
15         248          A
16         252          A
17         202          A
18         218          A
19         212          A
20         325          A
21         344          B
22         185          B
23         263          B
24         246          B
25         224          B
26         212          B
27         188          B
28         250          B
29         148          B
30         169          B
31         226          B
32         175          B
33         242          B
34         252          B
35         153          B
36         183          B
37         137          B
38         202          B
39         194          B
40         213          B

(A) Create an Epi Info file with these data. Call the data file WCGS.REC. Validate the data and turn it in on a floppy disk.

(B) Export the data file to a fixed-length field file and document this file on disk. Turn in the files WCGS.CAR and WCGS.DD on floppy.

(4) %IDEAL: Diabetic Body Weight

Source: Pagano, M, and Gauvreau, K. (1993). Principles of Biostatistics. Belmont, CA: Duxbury Press.

Data represent body weight expressed as a percentage of a hypothetical ideal. (Hypothetically ideal body weight is based on height, age, and sex.) For example, someone who weighs 150 lbs. who has a hypothetical ideal body weight of 160 lbs. would have a PERIDEAL value of 150 lbs. ÷ 160 lbs. �100 = 94. Data are: 107, 119, 99, 114, 120, 104, 88, 114, 124, 116, 101, 121, 152, 100, 125, 114, 95, 117.

(A) Create a REC with these data.

(B) List the data in hard-copy form.

References

Bennett, S., Myatt, M., Jolley, D., & Radalowicz. A. (1996). Data Management for Surveys and Trials. A Practical Primer Using Epi Info. Llanidloes, Powys, Great Britain: Brixton Books.

Dean, A. G., Dean, J. A., Coulombier, D., Brendel, K. A., Smith, D. C., Burton, A. H., Dicker, R. C., Sullivan, K., Fagan, R. F., and Arner, T. G. (1994). Epi Info Version 6: A Word Processing, Database, and Statistics Program for Epidemiology on Microcomputers. Atlanta, GA: Centers for Disease Control.

Rosner, B. (1990). Fundamentals of Biostatistics (3rd ed.) Boston: PWS - Kent Publishing.

Selvin, S. (1991). Statistical Analysis of Epidemiologic Data. New York: Oxford University Press.

Townsend, T., Shapiro, M., Rosner, B., & Kass, E. H. (1979). Use of antimicrobial drugs in general hospitals I. Description of population and definition of methods. Journal of Infectious Diseases, 139, 688-697.

Indicator Code	Data Type
#	Numeric integer
#.#	Numeric real
_	Alphanumeric
<A>	Alphanumeric, upper case only
<Y>	"Yes / no" (binary
<mm/dd/yyyy>	Date, American
<dd/mm/yyyy>	Date, European