Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative.

Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

[1]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[2]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[3]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[4]  W Vach,et al.  Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. , 1991, American journal of epidemiology.

[5]  Robert M. Friedman,et al.  Overview of the National Evaluation of the Comprehensive Community Mental Health Services for Children and Their Families Program , 2001 .

[6]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[7]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[8]  J. Brooks-Gunn,et al.  Maternal employment and child development: a fresh look using newer methods. , 2005, Developmental psychology.

[9]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[10]  Keith F. Widaman,et al.  New Methods for the Analysis of Change , 2003 .

[11]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[12]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[13]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[14]  J M Taylor,et al.  Multiple Imputation and Posterior Simulation for Multivariate Missing Data in Longitudinal Studies , 2000, Biometrics.

[15]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[16]  T. Achenbach Manual for the child behavior checklist/4-18 and 1991 profile , 1991 .

[17]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[18]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[19]  Linda M. Collins,et al.  New methods for the analysis of change , 2001 .

[20]  R. Rosenheck,et al.  Impact of Supported Housing on Clinical Outcomes: Analysis of a Randomized Trial Using Multiple Imputation Technique , 2007, The Journal of nervous and mental disease.

[21]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[22]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[23]  Roderick J. A. Little,et al.  A stimulation study to evaluate the performance of model-based multiple imputations in HCHS health examination surveys , 1995 .

[24]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[25]  Stephen R Cole,et al.  Use of multiple imputation in the epidemiologic literature. , 2008, American journal of epidemiology.

[26]  H. Ireys,et al.  State Regulation of Residential Facilities for Children with Mental Illness. Rockville, MD: Center for Mental Health Services, Substance Abuse and Mental Health Services Administration , 2006 .

[27]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[28]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[29]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[30]  J. Schafer,et al.  Correcting for Selective Nonresponse in the National Longitudinal Survey of Youth Using Multiple Imputation , 2001 .

[31]  Patrick Royston,et al.  A New Framework for Managing and Analyzing Multiply Imputed Data in Stata , 2008 .

[32]  M. Epstein The Development and Validation of a Scale to Assess the Emotional and Behavioral Strengths of Children and Adolescents , 1999 .

[33]  J.P.L. Brand,et al.  Development, Implementation and Evaluation of Multiple Imputation Strategies for the Statistical Analysis of Incomplete Data Sets , 1999 .

[34]  Hude Quan,et al.  Bmc Medical Research Methodology Open Access Dealing with Missing Data in a Multi-question Depression Scale: a Comparison of Imputation Methods , 2022 .