General and feasible tests with multiply-imputed datasets

Multiple imputation (MI) is a technique especially designed for handling missing data in public-use datasets. It allows analysts to perform incompletedata inference straightforwardly by using several already imputed datasets released by the dataset owners. However, the existing MI tests require either a restrictive assumption on the missing-data mechanism, known as equal odds of missing information (EOMI), or an infinite number of imputations. Some of them also require analysts to have access to restrictive or nonstandard computer subroutines. Besides, the existing MI testing procedures cover only Wald’s tests and likelihood ratio tests but not Rao’s score tests, therefore, these MI testing procedures are not general enough. In addition, the MI Wald’s tests and MI likelihood ratio tests are not procedurally identical, so analysts need to resort to distinct algorithms for implementation. In this paper, we propose a general MI procedure, called stacked multiple imputation (SMI), for performing Wald’s tests, likelihood ratio tests and Rao’s score tests by a unified algorithm. SMI requires neither EOMI nor an infinite number of imputations. It is particularly feasible for analysts as they just need to use a complete-data testing device for performing the corresponding incomplete-data test.

[1]  Mark W. Fraser,et al.  A Simplified Framework for Using Multiple Imputation in Social Work Research , 2008 .

[2]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[3]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[4]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[5]  Niall M. Adams,et al.  A comparison of efficient approximations for a weighted sum of chi-squared random variables , 2016, Stat. Comput..

[6]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[7]  Xiao-Li Meng,et al.  The AIDS Epidemic: Estimating Survival After AIDS Diagnosis From Surveillance Data , 1993 .

[8]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[9]  A. Agresti Foundations of Linear and Generalized Linear Models , 2015 .

[10]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[11]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[12]  Rong Zhu,et al.  Optimal Subsampling for Large Sample Logistic Regression , 2017, Journal of the American Statistical Association.

[13]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[14]  Calyampudi R. Rao,et al.  Score Test: Historical Review and Recent Developments , 2005 .

[15]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[16]  Xiao-Li Meng,et al.  Multiple Improvements of Multiple Imputation Likelihood Ratio Tests , 2017, 1711.08822.

[17]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[18]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[19]  D. Rubin,et al.  Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution , 1991 .

[20]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[21]  Xiao-Li Meng,et al.  Dissecting multiple imputation from a multi-phase inference perspective: what happens when God’s, imputer’s and analyst’s models are uncongenial? , 2016 .

[22]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[23]  Jerome P. Reiter,et al.  Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data , 2007 .

[24]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[25]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[26]  A Matrix Proof of Newton's Identities , 2000 .

[27]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[28]  D. Rubin,et al.  MULTIPLE IMPUTATIONS IN SAMPLE SURVEYS-A PHENOMENOLOGICAL BAYESIAN APPROACH TO NONRESPONSE , 2002 .

[29]  D. Fraser The p-value Function and Statistical Inference , 2019, The American Statistician.

[30]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[31]  Minge Xie,et al.  Confidence Distributions and a Unifying Framework for Meta-Analysis , 2011 .

[32]  Allan Birnbaum,et al.  Confidence Curves: An Omnibus Technique for Estimation and Testing Statistical Hypotheses , 1961 .

[33]  Alan F. Karr,et al.  Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality , 2010 .

[34]  S. Yang,et al.  A note on multiple imputation under complex sampling , 2017 .

[35]  Ryan Martin Plausibility Functions and Exact Frequentist Inference , 2012, 1203.6665.

[36]  G. Chen,et al.  Uniportal video-assisted thoracic surgery for major lung resection is associated with less immunochemokine disturbances than multiportal approach , 2021, Scientific Reports.

[37]  A. Schmidt-Trucksäss,et al.  P value functions: An underused method to present research results and to promote quantitative reasoning , 2019, Statistics in medicine.

[38]  Anil K. Bera,et al.  Rao's score, Neyman's C(α) and Silvey's LM tests: an essay on historical developments and some new results , 2001 .

[39]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[40]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[41]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[42]  Donald Fraser,et al.  Statistical Inference: Likelihood to Significance , 1991 .

[43]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[44]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[45]  K. Singh,et al.  Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review , 2013 .

[46]  Ryan Martin A Statistical Inference Course Based on p-Values , 2016 .

[47]  Nathaniel Schenker,et al.  Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files. , 2007, Paediatric and perinatal epidemiology.

[48]  Patrick Rubin-Delanchy,et al.  Choosing between methods of combining p-values , 2017, 1707.06897.

[49]  Jae Kwang Kim,et al.  Statistical Methods for Handling Incomplete Data , 2013 .