Persistence-Based Clustering in Riemannian Manifolds

We present a clustering scheme that combines a mode-seeking phase with a cluster merging phase in the corresponding density map. While mode detection is done by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of topological persistence to guide the merging of clusters. Our algorithm provides additional feedback in the form of a set of points in the plane, called a persistence diagram (PD), which provably reflects the prominences of the modes of the density. In practice, this feedback enables the user to choose relevant parameter values, so that under mild sampling conditions the algorithm will output the correct number of clusters, a notion that can be made formally sound within persistence theory. In addition, the output clusters have the property that their spatial locations are bound to the ones of the basins of attraction of the peaks of the density. The algorithm only requires rough estimates of the density at the data points, and knowledge of (approximate) pairwise distances between them. It is therefore applicable in any metric space. Meanwhile, its complexity remains practical: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of memory and takes barely more time to run than to read through the input.

[1]  I. Holopainen Riemannian Geometry , 1927, Nature.

[2]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[3]  Keinosuke Fukunaga,et al.  A Graph-Theoretic Approach to Nonparametric Cluster Analysis , 1976, IEEE Transactions on Computers.

[4]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[7]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[8]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[9]  F. Morgan Geometric Measure Theory: A Beginner's Guide , 1988 .

[10]  H. Scheraga,et al.  A comparison of the CHARMM, AMBER and ECEPP potentials for peptides. II. Phi-psi maps for N-acetyl alanine N'-methyl amide: comparisons, contrasts and simple experimental tests. , 1989, Journal of biomolecular structure & dynamics.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[14]  Herbert Edelsbrunner,et al.  Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[15]  Arcwise Isometries,et al.  A Course in Metric Geometry , 2001 .

[16]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[17]  Herbert Edelsbrunner,et al.  Hierarchical morse complexes for piecewise linear 2-manifolds , 2001, SCG '01.

[18]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[19]  A. Authier Acta Crystallographica Section A: Foundations of Crystallography , 2002 .

[20]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[22]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[23]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[24]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[25]  Bruno Pelletier Kernel density estimation on Riemannian manifolds , 2005 .

[26]  Leonidas J. Guibas,et al.  Persistence Barcodes for Shapes , 2005, Int. J. Shape Model..

[27]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[28]  Bruno Pelletier Non-parametric regression estimation on closed Riemannian manifolds , 2006 .

[29]  W. Huisinga,et al.  Metastability and Dominant Eigenvalues of Transfer Operators , 2006 .

[30]  Peter Meer,et al.  Nonlinear Mean Shift for Clustering over Analytic Manifolds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  John D. Chodera,et al.  Long-Time Protein Folding Dynamics from Short-Time Molecular Dynamics Simulations , 2006, Multiscale Model. Simul..

[32]  Patrizio Frosini,et al.  Using matching distance in size theory: A survey , 2006, Int. J. Imaging Syst. Technol..

[33]  Vin de Silva,et al.  On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[34]  D. Cohen-Steiner,et al.  Geometric Inference , 2007 .

[35]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[36]  Frédo Durand,et al.  A Topological Approach to Hierarchical Segmentation using Mean Shift , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ken A Dill,et al.  Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations. , 2007, Journal of chemical theory and computation.

[38]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[39]  Takeo Kanade,et al.  Mode-seeking by Medoidshifts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[41]  Chih-Jen Lin,et al.  PSC : Parallel Spectral Clustering , 2008 .

[42]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[43]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[44]  Leonidas J. Guibas,et al.  Proximity of persistence modules and their diagrams , 2009, SCG '09.

[45]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[46]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[47]  Leonidas J. Guibas,et al.  Analysis of scalar fields over point cloud data , 2009, SODA.

[48]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[49]  Frédéric Chazal,et al.  Geometric Inference for Measures based on Distance Functions , 2011 .

[50]  Leonidas J. Guibas,et al.  Persistence-based clustering in riemannian manifolds , 2011, SoCG '11.

[51]  Leonidas J. Guibas,et al.  Scalar Field Analysis over Point Cloud Data , 2011, Discret. Comput. Geom..

[52]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[53]  R. Ho Algebraic Topology , 2022 .