Please note the assumptions that went into the creation of this document: it is assumed the user has access to Stata software and has some basic Stata experience. Also, please note that this procedure is based on Roper Center datasets only and we recommend the use of the Stata “do file” for the best results.
The Best Way to Approach This Task
- Review the study documentation (also referred to as the codebook) first to identify the questions you want to analyze in Stata. The codebook fully describes the dataset. It includes information regarding the survey such as the study number, title, name of the survey organization that conducted the study, the sponsor (if applicable), the field dates, type of sample(s), sample size, type of interview, weight information, and number of records per respondent. The study methodology and any usage notes may also appear. The codebook/questionnaire includes the question numbers, question text, responses including the codes and labels, as well as the card and column locations for each variable. Some studies may include a list of variables and the card and column location for each variable. The back of the codebook includes a dump of the data referred to as an x-ray which shows the un-weighted number of cases in each punch for each column.
- Download both the documentation and data file at the same time
- Based on the number of records per respondent listed on the codebook cover page, determine the appropriate Stata “do file” needed (single record or multi-record)
- Use the examples and rules provided to create the required Stata “do file.”
- Weighting – If the codebook indicates there is a weight variable, the analysis should be run weighted in order to make the responses be representative of the population surveyed, and to replicate the responses published by the survey organization.
We will focus on determining the appropriate Stata “do file” needed and creating the required Stata “do file.”
ASCII Data Files
ASCII data files are often referred to as “text” files or “plain text” files. They contain no formatting information–just rows of characters. The “mapping” information for the characters comes from the codebook discussed in step one above.
- Single Record – respondent’s output is recorded on one line or row
- Multi-Record – respondent’s output is recorded on more than one line or row
Is the ASCII Single Record or Multi-Record?
Refer to the codebook cover page where the number of records per respondent is listed. If this number is “1” the file is a single record file, otherwise it is a multi-record file.
Bringing ASCII Data into Stata
Stata “do file”
Since no structure is included in an ASCII data file, a Stata “do file” must be created to instruct the software on where to go to get particular variables (questions).
Stata Commands That Every Stata “do file” Should Have
- Clear – This ensures Stata’s memory is clear
- Set more off – This command makes sure Stata runs all commands in the “do file” without stopping
- Infix – A command that tells Stata to read a fixed-format ASCII data file
- Variable names – Assigns a name to each variable and includes information on the column location of the variable in the raw text (ASCII) file
- Using – A suffix that tells Stata the path directory of where the raw data (ASCII) file is located
- Variable Labels – Assigns descriptive labels to variables in the dataset
- Value Labels – Assigns response labels for each variable.
|Example of a Single Record Stata “do file”|
*this program reads a single record data file into Stata
infix Q08 50-51 Sex 98-99 using "c:\temp\abcw887.dat"
label var Q08 "FBI Monitoring"
label define Q08l 1 "Support" 2 "Oppose" 8 "DK" 9 "NA-Refused"
Stata Syntax Rules
Command syntax in Stata is case sensitive
- Comments should start with an asterisk (*)
- The Stata “do file”extension is .do
- Stata programs can be written up and edited in a basic text file editor (Notepad or Wordpad)
Run the “do file”
There are two ways to run a “do file.” One is to open the file in Stata’s “do file” editor, highlight all commands and click “do.” This is helpful for making sure there are no errors in the “do file.” The most straightforward way, once you have a “do file” free of errors, is to open up Stata and select, from the pull down menu, File Do and then click the name of your “do file” to run it. This will run the commands in your “do file” and read your dataset into Stata.
|Example of Multi-Record Stata “do file”|
*this program reads a multi-record data file into Stata
Tips & Troubleshooting
- If you make a mistake in your “do file”, Stata will execute every command up until the mistake and produce an error message indicating which command has the error.
- If this is your first attempt at writing a Stata “do file”, run the file after 1-2 questions to make error identification easier. Once the file is error-free you can add additional questions and run the file again, continuing the process until all questions have been included.
- The general principles outlined here for Stata apply to SAS and SPSS as well. The syntax will be different, but the principles are the same.
- Sometimes you may run out of memory in using a large datafile. Add the command “set mem 1000m” to allow Stata to use 1 gigabyte of memory (this depends on how much memory your system has).
Complete Stata “do files” with ASCII Data File and Codebook
- Single Record Stata “do file” and Dataset Abstract
- ABC News/Washington Post Poll # 1991-9142: Thomas Vote Delay Poll #1, October 8, 1991
- Study # USABCWASH1991-9142
- Multi-Record Stata “do file” and Dataset Abstract
- Gallup/CNN/USA Today Poll # 1998-9808026: Anti-Terrorist Air Strikes, August 20, 1998
- Study # USAIPOCNUS1998-9808026
Additional Stata Resources
Research Technologies at Indiana University (http://www.indiana.edu/~statmath/stat/stata/index.html)
UCLA Academic Technology Services (http://www.ats.ucla.edu/stat/Stata/)