Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

Significance Under the general heading of “topological data analysis” (TDA), the recent adoption of topological methods for the analysis of large, complex, and high-dimensional data sets has established that the abstract concepts of algebraic topology provide powerful tools for data analysis. However, despite the successes of TDA, most applications have lacked formal statistical veracity, primarily due to difficulties in deriving distributional information about topological descriptors. We present an approach, Replicating Statistical Topology (RST), which takes the most basic descriptor of TDA, the persistence diagram, and, using models based on Gibbs distributions and Markov chain Monte Carlo, provides replications of it. These allow for formal statistical hypothesis testing, without requiring costly, or perhaps intrinsically unavailable, replications of the original data set. Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity.

[1]  Matthew Kahle,et al.  Topology of random geometric complexes: a survey , 2014, J. Appl. Comput. Topol..

[2]  G. Carlsson,et al.  Topology of viral evolution , 2013, Proceedings of the National Academy of Sciences.

[3]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[4]  R. Viertl On the Future of Data Analysis , 2002 .

[5]  Don Geman,et al.  Modeling and Inverse Problems in Image Analysis , 2006 .

[6]  Edward J. Wollack,et al.  Wilkinson Microwave Anisotropy Probe (WMAP) Three Year Results: Implications for Cosmology , 2006, astro-ph/0603449.

[7]  Herbert Edelsbrunner,et al.  A Short Course in Computational Geometry and Topology , 2014 .

[8]  Alexander Russell,et al.  Computational topology: ambient isotopic approximation of 2-manifolds , 2003, Theor. Comput. Sci..

[9]  T. Buchert,et al.  Model-independent analyses of non-Gaussianity in Planck CMB maps using Minkowski functionals , 2017, 1701.03347.

[10]  Timothy C. Coburn,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2007 .

[11]  R. Furnstahl,et al.  Neutron matter based on consistently evolved chiral three-nucleon interactions , 2013, 1301.7467.

[12]  E. Pastalkova,et al.  Clique topology reveals intrinsic geometric structure in neural correlations , 2015, Proceedings of the National Academy of Sciences.

[13]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[14]  J. Marron,et al.  Persistent Homology Analysis of Brain Artery Trees. , 2014, The annals of applied statistics.

[15]  Edward J. Wollack,et al.  FIVE-YEAR WILKINSON MICROWAVE ANISOTROPY PROBE OBSERVATIONS: COSMOLOGICAL INTERPRETATION , 2008, 0803.0547.

[16]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[17]  Aaron B. Adcock,et al.  The Ring of Algebraic Functions on Persistence Bar Codes , 2013, 1304.0530.

[18]  Herbert Edelsbrunner,et al.  The Topology of the Cosmic Web in Terms of Persistent Betti Numbers , 2016, 1608.04519.

[19]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[20]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[21]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[22]  Karl J. Friston,et al.  Topological inference for EEG and MEG , 2010, 1011.2901.

[23]  Matthew Kahle Topology of random simplicial complexes: a survey , 2013, 1301.7165.

[24]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[25]  Robert Ghrist,et al.  Elementary Applied Topology , 2014 .

[26]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[27]  Sivaraman Balakrishnan,et al.  Confidence sets for persistence diagrams , 2013, The Annals of Statistics.

[28]  C. A. Oxborrow,et al.  Planck 2015 results. II. Low Frequency Instrument data processings , 2013, 1502.01583.

[29]  Sayan Mukherjee,et al.  Fréchet Means for Distributions of Persistence Diagrams , 2012, Discrete & Computational Geometry.

[30]  Edward J. Wollack,et al.  First-Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Preliminary Maps and Basic Results , 2003, astro-ph/0302207.

[31]  C. A. Oxborrow,et al.  Planck 2015 results: XXIII. The thermal Sunyaev-Zeldovich effect-cosmic infrared background correlation , 2015, 1509.06555.

[32]  S. Mukherjee,et al.  Topological Consistency via Kernel Estimation , 2014, 1407.5272.

[33]  S. Mukherjee,et al.  Probability measures on the space of persistence diagrams , 2011 .

[34]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[35]  P. Skraba,et al.  Maximally Persistent Cycles in Random Geometric Complexes , 2015, 1509.04347.

[36]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[37]  Katharine Turner,et al.  Hypothesis testing for topological data analysis , 2013, J. Appl. Comput. Topol..

[38]  Frédéric Chazal,et al.  Robust Topological Inference: Distance To a Measure and Kernel Distance , 2014, J. Mach. Learn. Res..

[39]  H. Edelsbrunner,et al.  Topological data analysis , 2011 .

[40]  C. A. Oxborrow,et al.  Planck2013 results. XII. Diffuse component separation , 2013, Astronomy & Astrophysics.

[41]  Emerson G. Escolar,et al.  Hierarchical structures of amorphous solids characterized by persistent homology , 2015, Proceedings of the National Academy of Sciences.

[42]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[43]  Non-Gaussian signatures in the temperature fluctuation observed by the Wilkinson Microwave Anisotropy Probe , 2003, astro-ph/0307469.

[44]  T. Sousbie The persistent cosmic web and its filamentary structure I: Theory and implementation , 2010, 1009.4015.

[45]  C. A. Oxborrow,et al.  Planck2015 results , 2015, Astronomy & Astrophysics.

[46]  F. Papangelou GIBBS MEASURES AND PHASE TRANSITIONS (de Gruyter Studies in Mathematics 9) , 1990 .

[47]  Sayan Mukherjee,et al.  Probabilistic Fréchet Means and Statistics on Vineyards , 2013, ArXiv.

[48]  Carina Curto,et al.  What can topology tell us about the neural code , 2016, 1605.01905.

[49]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[50]  G. Hinshaw,et al.  Structure in the COBE differential microwave radiometer first-year maps , 1992 .

[51]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .