Local Topological Data Analysis to Uncover the Global Structure of Data Approaching Graph-Structured Topologies

Gene expression data of differentiating cells, galaxies distributed in space, and earthquake locations, all share a common property: they lie close to a graph-structured topology in their respective spaces [1, 4, 9, 10, 20], referred to as one-dimensional stratified spaces in mathematics. Often, the uncovering of such topologies offers great insight into these data sets. However, methods for dimensionality reduction are clearly inappropriate for this purpose, and also methods from the relatively new field of Topological Data Analysis (TDA) are inappropriate, due to noise sensitivity, computational complexity, or other limitations. In this paper we introduce a new method, termed Local TDA (LTDA), which resolves the issues of pre-existing methods by unveiling (global) graph-structured topologies in data by means of robust and computationally cheap local analyses. Our method rests on a simple graph-theoretic result that enables one to identify isolated, end-, edge- and multifurcation points in the topology underlying the data. It then uses this information to piece together a graph that is homeomorphic to the unknown one-dimensional stratified space underlying the point cloud data. We evaluate our method on a number of artificial and real-life data sets, demonstrating its superior effectiveness, robustness against noise, and scalability. Code related to this paper is available at: https://bitbucket.org/ghentdatascience/gltda-public.

[1]  R. Ho Algebraic Topology , 2022 .

[2]  Brittany Terese Fasy,et al.  Exploring persistent local homology in topological data analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Afra Zomorodian,et al.  Computing Persistent Homology , 2005, Discret. Comput. Geom..

[4]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[5]  Vin de Silva,et al.  On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[6]  Danielle S. Bassett,et al.  Two's company, three (or more) is a simplex - Algebraic-topological tools for understanding higher-order structure in neural data , 2016, J. Comput. Neurosci..

[7]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[8]  Radmila Sazdanovic,et al.  Simplicial Models and Topological Inference in Biological Systems , 2014, Discrete and Topological Models in Molecular Biology.

[9]  Kevin J. Emmett,et al.  Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination. , 2016, Cell systems.

[10]  Patrick S. Medina,et al.  Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data , 2015 .

[11]  Danielle S. Bassett,et al.  Two’s company, three (or more) is a simplex , 2016, Journal of Computational Neuroscience.

[12]  Jeffrey D. Ullman,et al.  Set Merging Algorithms , 1973, SIAM J. Comput..

[13]  Yvan Saeys,et al.  Unsupervised Trajectory Inference Using Graph Mining , 2015, CIBB.

[14]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[15]  Elena K. Kandror,et al.  Single-cell topological RNA-Seq analysis reveals insights into cellular differentiation and development , 2017, Nature Biotechnology.

[16]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[17]  Ronald L. Rivest,et al.  The Subgraph Homeomorphism Problem , 1980, J. Comput. Syst. Sci..

[18]  Frédéric Chazal,et al.  Geometric Inference for Probability Measures , 2011, Found. Comput. Math..

[19]  Valerio Pascucci,et al.  Branching and Circular Features in High Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[20]  M. Strauss,et al.  Tracing the filamentary structure of the galaxy distribution at z∼0.8 , 2010, 1003.3239.

[21]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[22]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[23]  L. Wasserman Topological Data Analysis , 2016, 1609.08227.