Iterative stepwise regression imputation using standard and robust methods

Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the ''recommended software'' for imputations in international and national statistical institutions. Using artificial data and real data sets from official statistics and other fields, the advantages of IRMI over IVEWARE-especially with respect to robustness-are demonstrated.

[1]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[2]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[3]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[4]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[5]  Sven Serneels,et al.  Principal component analysis for data containing outliers and missing elements , 2008, Comput. Stat. Data Anal..

[6]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[7]  Gabriele B. Durrant Imputation Methods for Handling Item-Nonresponse in the Social Sciences: A Methodological Review , 2005 .

[8]  Joseph L Schafer,et al.  Modeling and imputation of semicontinuous survey variables , 1999 .

[9]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[10]  Susanne Rässler,et al.  The Impact of multiple imputation for DACSEIS , 2004 .

[11]  Beat Hulliger,et al.  The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. , 2008 .

[12]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[13]  Peter Goos,et al.  Sequential imputation for missing values , 2007, Comput. Biol. Chem..

[14]  Peter Filzmoser,et al.  Imputation of missing values for compositional data using classical and robust methods , 2008 .

[15]  Thomas Lumley,et al.  Complex Surveys: A Guide to Analysis Using R , 2010 .

[16]  Ralf Münnich,et al.  Variance Estimation under Multiple Imputation , 2004 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  E. Ronchetti,et al.  Robust Inference for Generalized Linear Models , 2001 .

[19]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[20]  M. Templ,et al.  Visualization of missing values using the R-package VIM , 2008 .

[21]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[22]  Patrick Royston,et al.  Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables☆ , 2010, Comput. Stat. Data Anal..

[23]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[24]  D. Berry,et al.  Statistics: Theory and Methods , 1990 .

[25]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[26]  Christine H. Müller,et al.  High Breakdown Point and High Efficiency , 1997 .

[27]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[28]  Jonathan D. Fisher Income imputation and the analysis of consumer expenditure data , 2006 .

[29]  Peter Filzmoser,et al.  A computational and methodological framework for visualization and imputation of missing values: the R package VIM , 2011 .

[30]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[31]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[32]  R. Fay Alternative Paradigms for the Analysis of Imputed Survey Data , 1996 .