IMPUTING MISSING VALUES IN TWO-WAY CONTINGENCY TABLES USING LINEAR PROGRAMMING AND MARKOV CHAIN MONTE CARLO

Observations on two categorical variables are to be cross-classified in a contingency table B. Some units are classified by both row and column, comprising a subtable A of B. However, due to nonsampling error, the remaining units are classified only by row, or only by column, or not at all. The problem is to impute classifications for the remaining units, viz., add them to appropriate counts of A, resulting in an unbiased estimate of the true but unobserved table B. A classical approach to related problems is iterative B̂ proportional fitting (IPF) which produces MLEs for the entries of B, based on known marginals. Here, the relevant B-marginals are unknown. Also, under IPF some MLE’s may be smaller than the original entries of A, sample zeroes are fixed at zero, imputation variance is ignored, and MLE’s are likely to be noninteger. We present an alternative approach, based on linear programming and MCMC, that overcomes these drawbacks and in addition is capable of constraining imputations to within specified sampling variability. Our approach reveals a new method for MCMC simulation in an important class of multi-dimensional contingency tables.