Cluster-based Data Reduction for Persistent Homology

Persistent homology is used for computing topological features of a space at different spatial resolutions. It is one of the main tools from computational topology that is applied to the problems of data analysis. Despite several attempts to reduce its complexity, persistent homology remains expensive in both time and space. These limits are such that the largest data sets to which the method can be applied have the number of points of the order of thousands in ℝ3. This paper explores a technique intended to reduce the number of data points while preserving the salient topological features of the data. The proposed technique enables the computation of persistent homology on a reduced version of the original input data without affecting significant components of the output. Since the run time of persistent homology is exponential in the number of data points, the proposed data reduction method facilitates the computation in a fraction of the time required for the original data. Moreover, the data reduction method can be combined with any existing technique that simplifies the computation of persistent homology. The data reduction is performed by creating small groups of similar data points, called nano-clusters, and then replacing the points within each nano-cluster with its cluster center. The persistence homology of the reduced data differs from that of the original data by an amount bounded by the radius of the nano-clusters. The theoretical analysis is backed by experimental results showing that persistent homology is preserved by the proposed data reduction technique.

[1]  P. Y. Lum,et al.  Extracting insights from the shape of complex data using topology , 2013, Scientific Reports.

[2]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[3]  Afra Zomorodian,et al.  Fast construction of the Vietoris-Rips complex , 2010, Comput. Graph..

[4]  R. Forman Morse Theory for Cell Complexes , 1998 .

[5]  Herbert Edelsbrunner,et al.  Topological Persistence and Simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[7]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[8]  Konstantin Mischaikow,et al.  Morse Theory for Filtrations and Efficient Computation of Persistent Homology , 2013, Discret. Comput. Geom..

[9]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[10]  Mariette Yvinec,et al.  The Gudhi Library: Simplicial Complexes and Persistent Homology , 2014, ICMS.

[11]  Steve Oudot,et al.  The Structure and Stability of Persistence Modules , 2012, Springer Briefs in Mathematics.

[12]  Mason A. Porter,et al.  A roadmap for the computation of persistent homology , 2015, EPJ Data Science.

[13]  Robert Ghrist,et al.  Elementary Applied Topology , 2014 .

[14]  Georgios Evangelidis,et al.  A Simple Noise-Tolerant Abstraction Algorithm for Fast k-NN Classification , 2012, HAIS.

[15]  Matthew L. Wright,et al.  Introduction to Persistent Homology , 2016, SoCG.

[16]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Jovan Popović,et al.  Deformation transfer for triangle meshes , 2004, SIGGRAPH 2004.