A multi-scale approach for data imputation

A common pre-possessing task in machine learning is to complete missing data entries in order to form a full dataset. In case the dimension of the input data is high, it is often the case that the rows and columns are correlated. In this work, we construct a multi-scale model that is based on the the dual row-column geometry of the dataset and apply it to imputation, which is carried out within the model construction. Experimental results demonstrate the efficiency of our approach on a publicly available dataset.

[1]  Ronald R. Coifman,et al.  Heterogeneous Datasets Representation and Learning using Diffusion Maps and Laplacian Pyramids , 2012, SDM.

[2]  José R. Dorronsoro,et al.  Auto-adaptative Laplacian Pyramids for high-dimensional data analysis , 2013, ArXiv.

[3]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[4]  José R. Dorronsoro,et al.  Auto-adaptive Laplacian Pyramids , 2016, ESANN.

[5]  Muhammad Tayyab Asif,et al.  Low-dimensional models for missing data imputation in road networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[7]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[8]  Yelipe UshaRani,et al.  An efficient disease prediction and classification using feature reduction based imputation technique , 2016, 2016 International Conference on Engineering & MIS (ICEMIS).

[9]  Neta Rabin,et al.  Missing Data Completion Using Diffusion Maps and Laplacian Pyramids , 2017, ICCSA.

[10]  Mark Huisman,et al.  Missing data in behavioral science research: Investigation of a collection of data sets , 1998 .

[11]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[12]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[13]  Ronen Talmon,et al.  Nonlinear intrinsic variables and state reconstruction in multiscale simulations. , 2013, The Journal of chemical physics.