Topological Machine Learning with Persistence Indicator Functions

Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.

[1]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[2]  Heike Leitte,et al.  Clique Community Persistence: A Topological Visual Analysis Approach for Complex Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[3]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[4]  Matthew Kahle,et al.  Topology of random geometric complexes: a survey , 2014, J. Appl. Comput. Topol..

[5]  Jean-Philip Piquemal,et al.  Characterizing Molecular Interactions in Chemical Systems , 2014, IEEE Transactions on Visualization and Computer Graphics.

[6]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[7]  Henry Adams,et al.  Persistence Images: A Stable Vector Representation of Persistent Homology , 2015, J. Mach. Learn. Res..

[8]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[9]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[10]  Sayan Mukherjee,et al.  Fréchet Means for Distributions of Persistence Diagrams , 2012, Discrete & Computational Geometry.

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  David Cohen-Steiner,et al.  Lipschitz Functions Have Lp-Stable Persistence , 2010, Found. Comput. Math..

[13]  Dmitriy Morozov,et al.  Geometry Helps to Compare Persistence Diagrams , 2016, ALENEX.

[14]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[15]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[16]  Piotr Sankowski,et al.  Maximum matchings via Gaussian elimination , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Valerio Pascucci,et al.  Understanding the structure of the turbulent mixing layer in hydrodynamic instabilities , 2006 .

[18]  S. Plank Shall I compare thee to a , 1998, BMJ.

[19]  Herbert Edelsbrunner,et al.  Persistent Homology: Theory and Practice , 2013 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Herbert Edelsbrunner,et al.  Extreme Elevation on a 2-Manifold , 2006, Discret. Comput. Geom..

[22]  Karsten M. Borgwardt,et al.  graphkernels: R and Python packages for graph comparison , 2017, Bioinform..

[23]  Bernd Hamann,et al.  A topological hierarchy for functions on triangulated surfaces , 2004, IEEE Transactions on Visualization and Computer Graphics.

[24]  Frédéric Chazal,et al.  On the Bootstrap for Persistence Diagrams and Landscapes , 2013, ArXiv.

[25]  Ulrich Bauer,et al.  A stable multi-scale kernel for topological machine learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Herbert Edelsbrunner,et al.  Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[27]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.