The broad role of multiple imputation in statistical science

Nearly a quarter century ago, the basic idea of multiple imputation was proposed as a way to deal with missing values due to nonresponse in sample surveys. Since that time, the essential formulation has expanded to be proposed for use in a remarkably broad range of empirical problems, from many standard social science and biomedical applications involving missing data in surveys and experiments, to nonstandard survey and experimental applications, such as preserving confidentiality in public-use surveys and dealing with noncompliance and “censoring due to death” in clinical trails, to common “hard science” applications such as dealing with below-threshold chemometric measurements, to other scientific or medical applications such as imaging brains for tumors, and exploring the genetics of schizophrenia. The purpose of this presentation is to provide some links to this broad range of applications and to indicate the associated computing requirements, primarily using examples in which I am currently involved.

[1]  Donald B. Rubin Comment: EM for PET , 1985 .

[2]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[3]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[4]  Jun S. Liu,et al.  Markovian structures in biological sequence alignments , 1999 .

[5]  Donald B. Rubin,et al.  Maximum-Likelihood Estimation in Panel Studies with Missing Data , 1980 .

[6]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[7]  D B Rubin,et al.  Multiple Imputation for Multivariate Data with Missing and Below‐Threshold Measurements: Time‐Series Concentrations of Pollutants in the Arctic , 2001, Biometrics.

[8]  A. Dawid Causal Inference without Counterfactuals , 2000 .

[9]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[10]  D. Rubin,et al.  The analysis of repeated-measures data on schizophrenic reaction times using mixture models. , 1995, Statistics in medicine.

[11]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[12]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[13]  D. Rubin,et al.  Intention‐to‐treat analysis and the goals of clinical trials , 1995, Clinical pharmacology and therapeutics.

[14]  D. Rubin,et al.  Bayesian inference for causal effects in randomized experiments with noncompliance , 1997 .

[15]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[16]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[17]  D. Rubin Comment on "Causal inference without counterfactuals," by Dawid AP , 2000 .

[18]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .

[19]  D. Rubin,et al.  Assessing the effect of an influenza vaccine in an encouragement design. , 2000, Biostatistics.

[20]  Donald B. Rubin,et al.  Inference from Coarse Data via Multiple Imputation with Application to Age Heaping , 1990 .

[21]  A. Gelman,et al.  All maps of parameter estimates are misleading. , 1999, Statistics in medicine.

[22]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[23]  D. Rubin,et al.  Modeling schizophrenic behavior using general mixture components. , 1997, Biometrics.

[24]  Analysis of energy spectra with low photon counts via Bayesian posterior simulation , 2001, astro-ph/0008170.

[25]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[26]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[27]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[28]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  D. Rubin,et al.  Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data , 1998 .

[31]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[32]  D. Rubin,et al.  Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes , 1999 .

[33]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[34]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[35]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .