Two directional Laplacian pyramids with application to data imputation

Modeling and analyzing high-dimensional data has become a common task in various fields and applications. Often, it is of interest to learn a function that is defined on the data and then to extend its values to newly arrived data points. The Laplacian pyramids approach invokes kernels of decreasing widths to learns a given dataset and a function defined over it in a multi-scale manner. Extension of the function to new values may then be easily performed. In this work, we extend the Laplacian pyramids technique to model the data by considering two-directional connections. In practice, kernels of decreasing widths are constructed on the row-space and on the column space of the given dataset and in each step of the algorithm the data is approximated by considering the connections in both directions. Moreover, the method does not require solving a minimization problem as other common imputation techniques do, thus avoids the risk of a non-converging process. The method presented in this paper is general and may be adapted to imputation tasks. The numerical results demonstrate the ability of the algorithm to deal with a large number of missing data values. In addition, in most cases, the proposed method generates lower errors compared to existing imputation methods applied to benchmark dataset.

[1]  Ronald R. Coifman,et al.  Heterogeneous Datasets Representation and Learning using Diffusion Maps and Laplacian Pyramids , 2012, SDM.

[2]  Muhammad Tayyab Asif,et al.  Low-dimensional models for missing data imputation in road networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  José R. Dorronsoro,et al.  Auto-adaptative Laplacian Pyramids for high-dimensional data analysis , 2013, ArXiv.

[4]  Y. Kluger,et al.  Zero-preserving imputation of scRNA-seq data using low-rank approximation , 2018, bioRxiv.

[5]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[6]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[7]  Miguel Á. Carreira-Perpiñán,et al.  Manifold Learning and Missing Data Recovery through Unsupervised Regression , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Ronald R. Coifman,et al.  Co-manifold learning with missing data , 2018, ICML.

[11]  Peter Filzmoser,et al.  Iterative stepwise regression imputation using standard and robust methods , 2011, Comput. Stat. Data Anal..

[12]  Yelipe UshaRani,et al.  An efficient disease prediction and classification using feature reduction based imputation technique , 2016, 2016 International Conference on Engineering & MIS (ICEMIS).

[13]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[14]  Amir Averbuch,et al.  Multiview Kernels for Low-Dimensional Modeling of Seismic Events , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[16]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[17]  Amir Averbuch,et al.  Missing Entries Matrix Approximation and Completion , 2013, ArXiv.

[18]  José R. Dorronsoro,et al.  Auto-adaptive Laplacian Pyramids , 2016, ESANN.

[19]  Neta Rabin,et al.  Missing Data Completion Using Diffusion Maps and Laplacian Pyramids , 2017, ICCSA.

[20]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[21]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[22]  Daphna Weinshall,et al.  Optimized Linear Imputation , 2015, ICPRAM.

[23]  Joachim Selbig,et al.  Non-linear PCA: a missing data approach , 2005, Bioinform..

[24]  Ronald R. Coifman,et al.  Multiscale data sampling and function extension , 2013 .

[25]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[26]  Dalia Fishelov,et al.  A new vortex scheme for viscous flow , 1990 .

[27]  Mark Huisman,et al.  Missing data in behavioral science research: Investigation of a collection of data sets , 1998 .

[28]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[29]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[30]  Ronald R. Coifman,et al.  Harmonic Analysis of Digital Data Bases , 2011 .

[31]  Ronen Talmon,et al.  Nonlinear intrinsic variables and state reconstruction in multiscale simulations. , 2013, The Journal of chemical physics.

[32]  Anna C. Gilbert,et al.  Unrolling Swiss Cheese: Metric repair on manifolds with holes , 2018, ArXiv.

[33]  Søren Feodor Nielsen,et al.  1. Statistical Analysis with Missing Data (2nd edn). Roderick J. Little and Donald B. Rubin, John Wiley & Sons, New York, 2002. No. of pages: xv+381. ISBN: 0‐471‐18386‐5 , 2004 .

[34]  Yoram Singer,et al.  Local Low-Rank Matrix Approximation , 2013, ICML.

[35]  Nathanael Perraudin,et al.  Fast Robust PCA on Graphs , 2015, IEEE Journal of Selected Topics in Signal Processing.

[36]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.