Improved methods for estimating fraction of missing information in multiple imputation

Abstract Multiple imputation (MI) has become the most popular approach in handling missing data. Closely associated with MI, the fraction of missing information (FMI) is an important parameter for diagnosing the impact of missing data. Currently γm, the sample value of FMI estimated from MI of a limited m, is used as the estimate of γ0, the population value of FMI, where m is the number of imputations of the MI. This FMI estimation method, however, has never been adequately justified and evaluated. In this paper, we quantitatively demonstrated that E(γm) decreases with the increase of m so that E(γm) > γ0 for any finite m. As a result γm would inevitably overestimate γ0. Three improved FMI estimation methods were proposed. The major conclusions were substantiated by the results of the MI trials using the data of the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey, USA.

[1]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[2]  Katherine J. Lee,et al.  The rise of multiple imputation: a review of the reporting and implementation of the method in medical research , 2015, BMC Medical Research Methodology.

[3]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[4]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[5]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[6]  Victoria Savalei,et al.  On Obtaining Estimates of the Fraction of Missing Information From Full Information Maximum Likelihood , 2012 .

[7]  Ofer Harel,et al.  Inferences on missing information under multiple imputation and two-stage multiple imputation , 2007 .

[8]  J. Wagner The Fraction of Missing Information as a Tool for Monitoring the Quality of Survey Data , 2010 .

[9]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[10]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[11]  Denys T. Lau,et al.  Toward a More Complete Picture of Outpatient, Office-Based Health Care in the U.S. , 2016, American journal of preventive medicine.

[12]  N. Schenker,et al.  The Relative Impacts of Design Effects and Multiple Imputation on Variance Estimates: A Case Study with the 2008 National Ambulatory Medical Care Survey , 2014 .

[13]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[14]  Ian R Dohoo,et al.  Dealing with deficient and missing data. , 2015, Preventive veterinary medicine.

[15]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[16]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[17]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[18]  G. Reinsel,et al.  Introduction to Mathematical Statistics (4th ed.). , 1980 .

[19]  N. Schenker,et al.  A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014 , 2016, Journal of official statistics.

[20]  Eric W. Jamoom,et al.  Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey* , 2014, Applied mathematics.

[21]  R. Little,et al.  The treatment of missing data in a large cardiovascular clinical outcomes study , 2016, Clinical trials.

[22]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[23]  Scott L. Hershberger,et al.  A Note on Determining the Number of Imputations for Missing Data , 2003 .

[24]  Donald Hedeker,et al.  Binary variable multiple‐model multiple imputation to address missing data mechanism uncertainty: application to a smoking cessation trial , 2014, Statistics in medicine.