Co-manifold learning with missing data

Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering.

[1]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[2]  Ronald R. Coifman,et al.  Hierarchical Coupled-Geometry Analysis for Neuronal Structure and Activity Pattern Discovery , 2015, IEEE Journal of Selected Topics in Signal Processing.

[3]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[4]  Pradeep Ravikumar,et al.  Collaborative Filtering with Graph Information: Consistency and Scalable Methods , 2015, NIPS.

[5]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[6]  M. Cugmas,et al.  On comparing partitions , 2015 .

[7]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[8]  Ronald R. Coifman,et al.  Harmonic Analysis of Digital Data Bases , 2011 .

[9]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[10]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[11]  Xavier Bresson,et al.  Matrix Completion on Graphs , 2014, NIPS 2014.

[12]  Robert R. Meyer,et al.  Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms , 1976, J. Comput. Syst. Sci..

[13]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[14]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Jerrod Isaac Ankenman Geometry and Analysis of Dual Networks on Questionnaires , 2014 .

[17]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[18]  Ronald R. Coifman,et al.  Sampling, denoising and compression of matrices by coherent matrix organization , 2012 .

[19]  R. Horst,et al.  DC Programming: Overview , 1999 .

[20]  Michael Elad,et al.  Image Processing Using Smooth Ordering of its Patches , 2012, IEEE Transactions on Image Processing.

[21]  Ronald R. Coifman,et al.  Data-Driven Tree Transforms and Metrics , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[22]  R. Coifman,et al.  Earth Mover ’ s Distance and Equivalent Metrics for Spaces with Hierarchical Partition trees , 2013 .

[23]  Qing Zhou,et al.  Solution path clustering with adaptive concave penalty , 2014, 1404.6289.

[24]  Anna C. Gilbert,et al.  Unsupervised Metric Learning in Presence of Missing Data , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Wei Pan,et al.  Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty , 2013, J. Mach. Learn. Res..

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Miguel Á. Carreira-Perpiñán,et al.  Manifold Learning and Missing Data Recovery through Unsupervised Regression , 2011, 2011 IEEE 11th International Conference on Data Mining.

[29]  Ronen Talmon,et al.  PNAS Plus Significance Statements , 2017, Proceedings of the National Academy of Sciences.

[30]  Genevera I. Allen,et al.  Convex biclustering , 2014, Biometrics.

[31]  Anna C. Gilbert,et al.  Unrolling Swiss Cheese: Metric repair on manifolds with holes , 2018, ArXiv.

[32]  Stefan Steinerberger,et al.  Recovering Trees with Convex Clustering , 2018, SIAM J. Math. Data Sci..

[33]  L. Ljung,et al.  Just Relax and Come Clustering! : A Convexification of k-Means Clustering , 2011 .

[34]  Brian R. Gaines,et al.  Provable Convex Co-clustering of Tensors , 2018, J. Mach. Learn. Res..

[35]  Nathanael Perraudin,et al.  Fast Robust PCA on Graphs , 2015, IEEE Journal of Selected Topics in Signal Processing.

[36]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[37]  Joachim Selbig,et al.  Non-linear PCA: a missing data approach , 2005, Bioinform..

[38]  Wei Pan,et al.  A New Algorithm and Theory for Penalized Regression-based Clustering , 2016, J. Mach. Learn. Res..