Multiple imputation using chained equations for missing data in TIMSS: a case study

In this paper, we document a study that involved applying a multiple imputation technique with chained equations to data drawn from the 2007 iteration of the TIMSS database. More precisely, we imputed missing variables contained in the student background datafile for Tunisia (one of the TIMSS 2007 participating countries), by using Van Buuren, Boshuizen, and Knook’s (SM 18:681-694,1999) chained equations approach. We imputed the data in a way that was congenial with the analysis model. We also carried out different diagnostics in order to determine if the imputations were reasonable. Our analysis of multiply imputed data confirmed that the power of multiple imputation lies in obtaining smaller standard errors and narrower confidence intervals in addition to allowing one to work with the entire dataset.

[1]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[2]  Donald B. Rubin,et al.  Multiple imputations in sample surveys , 1978 .

[3]  Alexander M. Mood,et al.  Equality of Educational Opportunity. , 1967 .

[4]  M. Kenward,et al.  Brief comments on computational issues with multiple imputation , 2008 .

[5]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[6]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[7]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[8]  Paul Zhang Multiple Imputation: Theory and Method , 2003 .

[9]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[10]  Irina Bondarenko,et al.  Diagnostics for Multiple Imputations , 2007 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  A. Mood,et al.  Equality of Educational Opportunity. , 1967 .

[13]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[14]  Elizabeth A Stuart,et al.  American Journal of Epidemiology Practice of Epidemiology Multiple Imputation with Large Data Sets: a Case Study of the Children's Mental Health Initiative , 2022 .

[15]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[16]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[17]  S. van Buuren,et al.  Flexibele multiple imputation by chained equations of the AVO-95 Survey , 1999 .

[18]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[19]  P. Allison Multiple Imputation for Missing Data , 2000 .

[20]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[21]  Matthias von Davier,et al.  International Large-Scale Assessment Data , 2010 .

[22]  A. Acock Working With Missing Values , 2005 .

[23]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[24]  Jennifer Dixon,et al.  Modern Alternatives for Dealing with Missing Data in Special Education Research , 2006 .

[25]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[26]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[27]  L. Frank,et al.  Combining the complete-data and nonresponse models for drawing imputations under MAR , 2012 .

[28]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[29]  Oscar Kempthorne,et al.  A comparison of the chi2 and likelihood ratio tests for composite alternatives1 , 1972 .

[30]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[31]  David Reboussin,et al.  The science of web-based clinical trial management , 2005, Clinical trials.

[32]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[33]  A. Mood,et al.  Equality of Educational Opportunity. , 1967 .

[34]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[35]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[37]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[38]  Rainer Muche,et al.  Software for the Handling and Imputation of Missing Data - An Overview , 2012 .

[39]  Patrick Royston,et al.  Multiple Imputation by Chained Equations (MICE): Implementation in Stata , 2011 .

[40]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[41]  Søren Feodor Nielsen,et al.  Proper and Improper Multiple Imputation , 2003 .

[42]  Brent R. Moulton Random group effects and the precision of regression estimates , 1986 .

[43]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[44]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[45]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[46]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[47]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[48]  Patrick Royston,et al.  Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables☆ , 2010, Comput. Stat. Data Anal..

[49]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[50]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[51]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[52]  Alexander Hehmeyer,et al.  Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys , 2013 .

[53]  Trivellore E Raghunathan,et al.  Use of multiple imputation to correct for nonresponse bias in a survey of urologic symptoms among African-American men. , 2002, American journal of epidemiology.

[54]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[55]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[56]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.