Robust Topological Inference: Distance To a Measure and Kernel Distance

Let P be a distribution with support S. The salient features of S can be quantified with persistent homology, which summarizes topological features of the sublevel sets of the distance function (the distance of any point x to S). Given a sample from P we can infer the persistent homology using an empirical version of the distance function. However, the empirical distance function is highly non-robust to noise and outliers. Even one outlier is deadly. The distance-to-a-measure (DTM), introduced by Chazal et al. (2011), and the kernel distance, introduced by Phillips et al. (2014), are smooth functions that provide useful topological information but are robust to noise and outliers. Chazal et al. (2014) derived concentration bounds for DTM. Building on these results, we derive limiting distributions and confidence sets, and we propose a method for choosing tuning parameters.

[1]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[2]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[3]  Jon A. Wellner,et al.  Empirical Processes with Applications to Statistics. , 1988 .

[4]  V. Icke,et al.  The Galaxy Distribution as a Voronoi Foam , 1994 .

[5]  Jesse Freeman,et al.  in Morse theory, , 1999 .

[6]  J. Yukich Laws of large numbers for classes of functions , 1985 .

[7]  Eugene F. Schuster,et al.  Incorporating support constraints into nonparametric estimators of densities , 1985 .

[8]  Frédéric Chazal,et al.  Rates of convergence for robust geometric inference , 2015, ArXiv.

[9]  Frédéric Chazal,et al.  Deconvolution for the Wasserstein Metric and Geometric Inference , 2011, GSI.

[10]  Steve Oudot,et al.  The Structure and Stability of Persistence Modules , 2012, Springer Briefs in Mathematics.

[11]  Deniz Erdogmus,et al.  Locally Defined Principal Curves and Surfaces , 2011, J. Mach. Learn. Res..

[12]  Leonidas J. Guibas,et al.  Proximity of persistence modules and their diagrams , 2009, SCG '09.

[13]  A. Cuevas,et al.  On boundary estimation , 2004, Advances in Applied Probability.

[14]  Carsten Griwodz,et al.  Soccer video and player position dataset , 2014, MMSys '14.

[15]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[16]  Sivaraman Balakrishnan,et al.  Confidence sets for persistence diagrams , 2013, The Annals of Statistics.

[17]  S. Mukherjee,et al.  Topological Consistency via Kernel Estimation , 2014, 1407.5272.

[18]  Leonidas J. Guibas,et al.  Witnessed k-Distance , 2013, Discret. Comput. Geom..

[19]  F. Chazal,et al.  Deconvolution for the Wasserstein Metric and Geometric Inference , 2011 .

[20]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[21]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[22]  Brittany Terese Fasy,et al.  Introduction to the R package TDA , 2014, ArXiv.

[23]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[24]  Leonidas J. Guibas,et al.  Witnessed k-Distance , 2011, Discrete & Computational Geometry.

[25]  Bei Wang,et al.  Geometric Inference on Kernel Density Estimates , 2013, SoCG.

[26]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[27]  S. Bobkov,et al.  One-dimensional empirical measures, order statistics, and Kantorovich transport distances , 2019, Memoirs of the American Mathematical Society.

[28]  Herbert Edelsbrunner,et al.  Alpha, Betti and the Megaparsec Universe: On the Topology of the Cosmic Web , 2013, Trans. Comput. Sci..

[29]  Frédéric Chazal,et al.  Geometric Inference for Probability Measures , 2011, Found. Comput. Math..

[30]  A. Banyaga,et al.  Lectures on Morse Homology , 2005 .

[31]  Prakasa Rao Nonparametric functional estimation , 1983 .

[32]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[33]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[34]  Michel Demazure,et al.  Bifurcations and Catastrophes: Geometry Of Solutions To Nonlinear Problems , 2000 .

[35]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[36]  M. Golubitsky,et al.  Stable mappings and their singularities , 1973 .