Proceedings of Statistics Canada Symposium 2001 Achieving Data Quality in a Statistical Agency: a Methodological Perspective OPTIMIZATION TECHNIQUES FOR EDIT VALIDATION AND DATA IMPUTATION

The paper is concerned with the problem of automatic detection and correction of inconsistent or out of range data in a general process of statistical data collecting. The proposed approach is able to deal with both qualitative and quantitative values. Our purpose is also to overcome computational limits of Fellegi-Holt approach, while maintaining its positive features. As customary, data records must respect a set of rules in order to be declared correct. By encoding the rules with linear inequalities, we develop mathematical models for the problems of interest. As a first relevant point, the set of rules itself is checked for inconsistency or redundancy, by solving a sequence of feasibility problems. As a second relevant point, imputation is performed by solving a sequence of set covering problems.