Multiple imputation of discrete and continuous data by fully conditional specification

The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in ...

[1]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[2]  Mariza de Andrade,et al.  Imputation methods for missing data for polygenic models , 2003, BMC Genetics.

[3]  Stef van Buuren,et al.  Pubertal Development in The Netherlands 1965–1997 , 2001, Pediatric Research.

[4]  Jürgen Unützer,et al.  A comparison of imputation methods in a longitudinal randomized clinical trial , 2005, Statistics in medicine.

[5]  Maher M El-Masri,et al.  Handling missing data in self-report measures. , 2005, Research in nursing & health.

[6]  Geert Molenberghs,et al.  Direct likelihood analysis versus simple forms of imputation for missing data in randomized clinical trials , 2005, Clinical trials.

[7]  Pierre Côté,et al.  Methods to Account for Attrition in Longitudinal Data: Do They Work? A Simulation Study , 2005, European Journal of Epidemiology.

[8]  Arthur B. Kennickell,et al.  Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation , 1997 .

[9]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[10]  D B Rubin,et al.  Multiple Imputation for Multivariate Data with Missing and Below‐Threshold Measurements: Time‐Series Concentrations of Pollutants in the Arctic , 2001, Biometrics.

[11]  L. A. Goodman The Multivariate Analysis of Qualitative Data: Interactions among Multiple Classifications , 1970 .

[12]  W. Tierney,et al.  Multiple imputation in public health research , 2001, Statistics in medicine.

[13]  T. Speed,et al.  Characterizing a joint probability distribution by conditionals , 1993 .

[14]  Jan B Oostenbrink,et al.  The analysis of incomplete cost data due to dropout. , 2005, Health economics.

[15]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[16]  G. C. Wei,et al.  Applications of multiple imputation to the analysis of censored regression data. , 1991, Biometrics.

[17]  M Y Hu,et al.  Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study. , 1999, Statistics in medicine.

[18]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[19]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[20]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[21]  P. Allison Missing data techniques for structural equation modeling. , 2003, Journal of abnormal psychology.

[22]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[23]  L Ryan,et al.  Semiparametric Regression Analysis of Interval‐Censored Data , 2000, Biometrics.

[24]  Jeremy MG Taylor,et al.  Partially parametric techniques for multiple imputation , 1996 .

[25]  S. Crawford,et al.  A comparison of anlaytic methods for non-random missingness of outcome data. , 1995, Journal of clinical epidemiology.

[26]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[27]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[28]  B. Arnold,et al.  Conditional specification of statistical models , 1999 .

[29]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[30]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[31]  Alan Olinsky,et al.  The comparative efficacy of imputation methods for missing data in structural equation modeling , 2003, Eur. J. Oper. Res..

[32]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[33]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[34]  A Lawrence Gould,et al.  COMPARISON OF ALTERNATIVE STRATEGIES FOR ANALYSIS OF LONGITUDINAL TRIALS WITH DROPOUTS , 2002, Journal of biopharmaceutical statistics.

[35]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[36]  Lynn McCleary,et al.  Using Multiple Imputation for Analysis of Incomplete Data in Clinical Research , 2002, Nursing research.

[37]  Hakan Demirtas,et al.  Modeling Incomplete Longitudinal Data , 2004 .

[38]  G Molenberghs,et al.  Analysis of incomplete public health data. , 1999, Revue d'epidemiologie et de sante publique.

[39]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[40]  Donald B. Rubin,et al.  Nested multiple imputation of NMES via partially incompatible MCMC , 2003 .

[41]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[42]  Ian R White,et al.  Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes. , 2004, International journal of epidemiology.

[43]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[44]  R Little,et al.  Intent-to-treat analysis for longitudinal studies with drop-outs. , 1996, Biometrics.

[45]  Fritz Scheuren,et al.  Multiple Imputation , 2005 .

[46]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[47]  Recail M Yucel,et al.  Imputation of Binary Treatment Variables With Measurement Error in Administrative Data , 2005 .

[48]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[49]  D. Massart,et al.  Dealing with missing data: Part II , 2001 .

[50]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[51]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update of Ice , 2005 .

[52]  William A Ghali,et al.  Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. , 2002, Journal of clinical epidemiology.

[53]  David L Streiner,et al.  The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries , 2002, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[54]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[55]  W Pan,et al.  A Multiple Imputation Approach to Linear Regression with Clustered Censored Data , 2001, Lifetime data analysis.

[56]  D. Novins,et al.  Methods for addressing missing data in psychiatric and developmental research. , 2005, Journal of the American Academy of Child and Adolescent Psychiatry.

[57]  Stuart R. Lipsitz,et al.  A note on reducing the bias of the approximate Bayesian bootstrap imputation variance estimator , 2005 .

[58]  Steven G. Heeringa Multivariate imputation of coarsened survey data on household wealth. , 2000 .

[59]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[60]  Stef van Buuren,et al.  A toolkit in SAS for the evaluation of multiple imputation methods , 2003 .

[61]  M. Chavance,et al.  Handling Missing Items in Quality of Life Studies , 2004 .

[62]  Lawrence Joseph,et al.  Multiple Imputation to Account for Missing Data in a Survey: Estimating the Prevalence of Osteoporosis , 2002, Epidemiology.

[63]  N T Longford Multilevel analysis with messy data , 2001, Statistical methods in medical research.

[64]  John W Seaman,et al.  Multiple imputation techniques in small sample clinical trials , 2006, Statistics in medicine.

[65]  J. Wanzer Drane,et al.  Multiple Imputation For Missing Ordinal Data , 2005 .

[66]  Susan M. Paddock,et al.  Bayesian nonparametric multiple imputation of partially observed data with ignorable nonresponse , 2002 .

[67]  Trivellore E Raghunathan,et al.  What do we do with missing data? Some options for analysis of incomplete data. , 2004, Annual review of public health.

[68]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[69]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[70]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[71]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[72]  Sati Mazumdar,et al.  Estimating treatment effects from longitudinal clinical trial data with missing values: comparative analyses using different methods , 2004, Psychiatry Research.

[73]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[74]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[75]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[76]  D. Russell,et al.  Missing data: a review of current methods and applications in epidemiological research , 2004 .

[77]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[78]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[79]  Andrew Briggs,et al.  Missing... presumed at random: cost-analysis of incomplete data. , 2003, Health economics.

[80]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[81]  Stef van Buuren,et al.  Continuing Positive Secular Growth Change in the Netherlands 1955–1997 , 2000, Pediatric Research.

[82]  A. Gelman Parameterization and Bayesian Modeling , 2004 .

[83]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[84]  P. Patrician Multiple imputation for missing data. , 2002, Research in nursing & health.

[85]  W Pan,et al.  A Multiple Imputation Approach to Cox Regression with Interval‐Censored Data , 2000, Biometrics.

[86]  B. Arnold,et al.  Conditionally Specified Distributions: An Introduction (with comments and a rejoinder by the authors) , 2001 .

[87]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[88]  J.P.L. Brand,et al.  Development, Implementation and Evaluation of Multiple Imputation Strategies for the Statistical Analysis of Incomplete Data Sets , 1999 .

[89]  S M Kneipp,et al.  Handling Missing Data in Nursing Research With Multiple Imputation , 2001, Nursing research.

[90]  T. Pigott,et al.  Missing Predictors in Models of Effect Size , 2001, Evaluation & the health professions.

[91]  J. Cerhan,et al.  Epidemiologic Evaluation of Measurement Data in the Presence of Detection Limits , 2004, Environmental Health Perspectives.

[92]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[93]  Song Yang,et al.  Imputation of missing data when measuring physical activity by accelerometry. , 2005, Medicine and science in sports and exercise.

[94]  Margaret S. Pepe,et al.  The relationship between hot-deck multiple imputation and weighted likelihood. , 1997, Statistics in medicine.

[95]  James M. Tanner,et al.  Variations in pattern of pubertal changes in girls. , 1969 .

[96]  Nicholas J. Horton,et al.  Multiple Imputation in Practice , 2001 .

[97]  Jos Twisk,et al.  Attrition in longitudinal studies. How to deal with missing data. , 2002, Journal of clinical epidemiology.

[98]  William S. Reece,et al.  Imputation of Missing Values When the Probability of Response Depends on the Variable Being Imputed , 1982 .

[99]  Harrie C. M. Vorst,et al.  Alternative Missing Data Techniques to Grade Point Average: Imputing Unavailable Grades , 2002 .