Statistical Topological Data Analysis - A Kernel Perspective

We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.

[1]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[2]  Moo K. Chung,et al.  Hole Detection in Metabolic Connectivity of Alzheimer's Disease Using k -Laplacian , 2014, MICCAI.

[3]  Frédéric Chazal,et al.  Stochastic Convergence of Persistence Landscapes and Silhouettes , 2013, J. Comput. Geom..

[4]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[5]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[6]  J. Marron,et al.  Persistent Homology Analysis of Brain Artery Trees. , 2014, The annals of applied statistics.

[7]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[8]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[9]  Afra Zomorodian,et al.  Computational topology , 2010 .

[10]  Moo K. Chung,et al.  Persistence Diagrams of Cortical Surface Data , 2009, IPMI.

[11]  John G. Csernansky,et al.  Open Access Series of Imaging Studies: Longitudinal MRI Data in Nondemented and Demented Older Adults , 2010, Journal of Cognitive Neuroscience.

[12]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[13]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[14]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[15]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[16]  Dinggang Shen,et al.  Multivariate Longitudinal Shape Analysis of Human Lateral Ventricles during the First Twenty-Four Months of Life , 2014, PloS one.

[17]  Ulrich Bauer,et al.  A stable multi-scale kernel for topological machine learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[19]  S. Mukherjee,et al.  Probability measures on the space of persistence diagrams , 2011 .

[20]  Aaron B. Adcock,et al.  The Ring of Algebraic Functions on Persistence Bar Codes , 2013, 1304.0530.

[21]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[22]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[23]  Andreas Christmann,et al.  Universal Kernels on Non-Standard Input Spaces , 2010, NIPS.

[24]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[25]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[26]  Alexander Russell,et al.  Computational topology: ambient isotopic approximation of 2-manifolds , 2003, Theor. Comput. Sci..

[27]  Sayan Mukherjee,et al.  Probabilistic Fréchet Means and Statistics on Vineyards , 2013, ArXiv.

[28]  Sivaraman Balakrishnan,et al.  Confidence sets for persistence diagrams , 2013, The Annals of Statistics.

[29]  Sayan Mukherjee,et al.  Fréchet Means for Distributions of Persistence Diagrams , 2012, Discrete & Computational Geometry.

[30]  J. S. Marron,et al.  Topological Descriptors of Histology Images , 2014, MLMI.

[31]  Maks Ovsjanikov,et al.  Persistence-Based Structural Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.