Optimal Linear Imputation with a Convergence Guarantee

It is a common occurrence in the field of data science that real-world datasets, especially when they are high dimensional, contain missing entries. Since most machine learning, data analysis, and statistical methods are not able to handle missing values gracefully, these must be filled in prior to the application of these methods. It is no surprise therefore that there has been a long standing interest in methods for imputation of missing values. One recent, popular, and effective approach, the IRMI stepwise regression imputation method, models each feature as a linear combination of all other features. A linear regression model is then computed for each real-valued feature on the basis of all other features in the dataset, and subsequent predictions are used as imputation values. However, the proposed iterative formulation lacks a convergence guarantee. Here we propose a closely related method, stated as a single optimization problem, and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiment results on both synthetic and benchmark datasets are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often diverges, the performance of our method is shown to be markedly superior in comparison to other methods.

[1]  D. Heitjan,et al.  Distinguishing “Missing at Random” and “Missing Completely at Random” , 1996 .

[2]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[3]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[4]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[5]  Luís A. Alexandre,et al.  Improving transfer learning accuracy by reusing Stacked Denoising Autoencoders , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[6]  Daphna Weinshall,et al.  Matrix factorization approach to behavioral mode analysis from acceleration data , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[7]  Orr Spiegel,et al.  AcceleRater: a web application for supervised learning of behavioral modes from acceleration measurements , 2014, Movement ecology.

[8]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[9]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .

[12]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[13]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[14]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[15]  Daphna Weinshall,et al.  Topic modeling of behavioral modes using sensor data , 2016, International Journal of Data Science and Analytics.

[16]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[17]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[18]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[19]  Therese D. Pigott,et al.  A Review of Methods for Missing Data , 2001 .

[20]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[21]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[22]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[23]  Damaris Zurell,et al.  The challenges of the first migration: movement and behaviour of juvenile vs. adult white storks with insights regarding juvenile mortality. , 2016, The Journal of animal ecology.

[24]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[25]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[26]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..

[27]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[28]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[29]  Daphna Weinshall,et al.  Optimized Linear Imputation , 2015, ICPRAM.

[30]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[31]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[32]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[33]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[34]  Peter Filzmoser,et al.  Iterative stepwise regression imputation using standard and robust methods , 2011, Comput. Stat. Data Anal..

[35]  Yisheng Lv,et al.  A deep learning based approach for traffic data imputation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[36]  Dafna Shahaf,et al.  Ballpark Learning: Estimating Labels from Rough Group Comparisons , 2016, ECML/PKDD.

[37]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .