Topologically Regularized Data Embeddings

Unsupervised feature learning often finds low-dimensional embeddings that capture the structure of complex data. For tasks for which prior expert topological knowledge is available, incorporating this into the learned representation may lead to higher quality embeddings. For example, this may help one to embed the data into a given number of clusters, or to accommodate for noise that prevents one from deriving the distribution of the data over the model directly, which can then be learned more effectively. However, a general tool for integrating different prior topological knowledge into embeddings is lacking. Although differentiable topology layers have been recently developed that can (re)shape embeddings into prespecified topological models, they have two important limitations for representation learning, which we address in this paper. First, the currently suggested topological losses fail to represent simple models such as clusters and flares in a natural manner. Second, these losses neglect all original structural (such as neighborhood) information in the data that is useful for learning. We overcome these limitations by introducing a new set of topological losses, and proposing their usage as a way for topologically regularizing data embeddings to naturally represent a prespecified model. We include thorough experiments on synthetic and real data that highlight the usefulness and versatility of this approach, with applications ranging from modeling high-dimensional single-cell data, to graph embedding.

[1]  R. Ho Algebraic Topology , 2022 .

[2]  Tijl De Bie,et al.  Mining Topological Structure in Graphs through Forest Representations , 2020, J. Mach. Learn. Res..

[3]  P Cignoni,et al.  DeWall: A fast divide and conquer Delaunay triangulation algorithm in Ed , 1998, Comput. Aided Des..

[4]  Olivier Devillers,et al.  A Poisson sample of a smooth surface is a good sample , 2018 .

[5]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[6]  Rex A. Dwyer Higher-dimensional voronoi diagrams in linear expected time , 1989, SCG '89.

[7]  Wei Keat Lim,et al.  Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing , 2020, bioRxiv.

[8]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[9]  Yvan Saeys,et al.  Stable topological signatures for metric trees through graph approximations , 2021, Pattern Recognit. Lett..

[10]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[11]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[12]  Yvan Saeys,et al.  The Curse Revisited: a Newly Quantified Concept of Meaningful Distances for Learning from High-Dimensional Noisy Data , 2021, ArXiv.

[13]  Paul Bendich,et al.  A Fast and Robust Method for Global Topological Functional Optimization , 2020, AISTATS.

[14]  Jacob G. Scott,et al.  Benchmarking R packages for Calculation of Persistent Homology , 2021, R J..

[15]  Mariette Yvinec,et al.  Geometric and Topological Inference , 2018 .

[16]  Aaron B. Adcock,et al.  The Ring of Algebraic Functions on Persistence Bar Codes , 2013, 1304.0530.

[17]  S. A. Barannikov,et al.  The framed Morse complex and its invariants , 1994 .

[18]  Olivier Devillers,et al.  Complexity of Delaunay triangulation for points on lower-dimensional polyhedra , 2007, SODA '07.

[19]  Leonidas J. Guibas,et al.  Gromov‐Hausdorff Stable Signatures for Shapes using Persistence , 2009, Comput. Graph. Forum.

[20]  Steve Oudot,et al.  Persistence Theory - From Quiver Representations to Data Analysis , 2015, Mathematical surveys and monographs.

[21]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[22]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[23]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[24]  Mason A. Porter,et al.  A roadmap for the computation of persistent homology , 2015, EPJ Data Science.

[25]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[26]  Karsten M. Borgwardt,et al.  Topological Autoencoders , 2019, ICML.

[27]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[28]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[29]  Chi Seng Pun,et al.  Persistent-Homology-Based Machine Learning and Its Applications -- A Survey , 2018, 1811.00252.

[30]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Pre-processing for noise detection in gene expression classification data , 2009, Journal of the Brazilian Computer Society.

[32]  Leonidas J. Guibas,et al.  A Topology Layer for Machine Learning , 2019, AISTATS.

[33]  L. van den Dries,et al.  Tame Topology and O-minimal Structures , 1998 .

[34]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[35]  Walid Krichene,et al.  Neural Collaborative Filtering vs. Matrix Factorization Revisited , 2020, RecSys.

[36]  Frédéric Chazal,et al.  Optimizing persistent homology based functions , 2020, ICML.

[37]  Seyed Mehran Kazemi,et al.  SimplE Embedding for Link Prediction in Knowledge Graphs , 2018, NeurIPS.

[38]  Naihua Xiu,et al.  Constrained Best Euclidean Distance Embedding on a Sphere: A Matrix Optimization Approach , 2015, SIAM J. Optim..

[39]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[40]  Rui Jiang,et al.  Reconstructing cell cycle pseudo time-series via single-cell transcriptome data , 2017, Nature Communications.

[41]  Yvan Saeys,et al.  Graph Approximations to Geodesics on Metric Graphs , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).