A Comparison Study of ACS If-Then-Else , NIM , and DISCRETE Edit and Imputation Systems Using ACS Data

In any statistical surveys, the information gathered may contain inconsistent, incorrect, or missing data. These erroneous data need to be revised or lled in prior to data tabulations and retrieval. The revisions of the erroneous data should not a ect the statistical inferences of the data. The missing data, as well as some inconsistent or incorrect data, are easy to identify while others are not. For those not easily identi ed, a set of edit rules is needed to specify whether a set of data record is erroneous. One of the important steps of this systematic revision process of the erroneous data is computer editing. The edit rules are traditionally implemented with computer coding of if-then-else structures and many statistical agencies have chosen to adopt these methods. The disadvantages of the if-then-else structures are that they may not be straightforward to develop and may be di cult to write the computer code to implement them. In addition, if there are slight changes in the edit rules or survey form, the software may not be reusable, which will cause thousands of lines of code to be rewritten and debugged. In this paper, we will compare the if-then-else (ITE hereafter) rules with alternative approaches that have potential to improve the data quality of survey data. The alternative approaches are the Fellegi-Holt model based DISCRETE edit system of the U.S. Bureau of the Census and NIM of Statistics Canada. We use the 1999 American Community Survey (ACS) data of 26 states for the comparisons. The ITE rules used are described in the 1999 ACS Edit and Allocation Speci cations for Basic Population Variables, which include sex, age, household relationship, marital status, race, and Hispanic origin. Only the the rst four variables are included in this study. The DISCRETE edit system (Winkler and Petkunas [1996]) is designed for general edits of discrete data. It utilizes the Fellegi-Holt model of editing and contains two major components: edit generation and error localization. Fellegi and Holt [1976] provided an underlying basis of developing another implementation of computer edit system. Their methods have the virtues that the logical consistency of the entire