Inference with Imputed Conditional Means

Abstract In this article we present analytic techniques for inference from a dataset in which missing values have been replaced by predictive means derived from an imputation model. The derivations are based on asymptotic expansions of point estimators and their associated variance estimators, and the resulting formulas can be thought of as first-order approximations to standard multiple-imputation procedures with an infinite number of imputations for the missing values. Our method, where applicable, may require substantially less computational effort than creating and managing a multiply imputed database; moreover, the resulting inferences can be more precise than those derived from multiple imputation, because they do not rely on simulation. Our techniques use components of the standard complete-data analysis, along with two summary measures from the fitted imputation model. If the imputation and analysis phases are carried out by the same person or organization, then the method provides a quick assessment of the variability due to missing data. If a data producer is supplying the imputed data set to outside analysts, then the necessary summary measures could be supplied to the analysts, enabling them to apply the method themselves. We emphasize situations with iid samples, univariate missing data, and complete-data point estimators that are smooth functions of means, but also discuss extensions to more complicated situations. We illustrate properties of our methods in several examples, including an application to a large dataset on fatal accidents maintained by the National Highway Traffic Safety Administration.

[1]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[2]  H. Hogan The 1990 Post-Enumeration Survey: operations and results. , 1993, Journal of the American Statistical Association.

[3]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[4]  N. Schenker [Handling missing data in coverage estimation, with application to the 1986 Test of Adjustment Related Operations]. , 1988, Survey methodology.

[5]  J. Rao On Variance Estimation with Imputed Survey Data , 1996 .

[6]  A. Winsor Sampling techniques. , 2000, Nursing times.

[7]  D. Rubin,et al.  Hierarchical logistic regression models for imputation of unresolved enumeration status in undercount estimation. , 1993, Journal of the American Statistical Association.

[8]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[9]  R. Fay Alternative Paradigms for the Analysis of Imputed Survey Data , 1996 .

[10]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[11]  M. Kendall Theoretical Statistics , 1956, Nature.

[12]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[13]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[14]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[15]  Suzanne M. Dorinski,et al.  ACCOUNTING FOR VARIANCE DUE TO IMPUTATION IN THE INTEGRATED COVERAGE MEASUREMENT SURVEY , 2002 .

[16]  T M Klein,et al.  A METHOD FOR ESTIMATING POSTERIOR BAC DISTRIBUTIONS FOR PERSONS INVOLVED IN FATAL TRAFFIC ACCIDENTS , 1986 .