Topological data analysis: Concepts, computation, and applications in chemical engineering

A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional spaces that are composed of elementary geometric objects. Key property of these elementary objects (also known as topological features) are that they persist at different scales and that they are stable under perturbations (e.g., noise, stretching, twisting, and bending). In this work, we review key mathematical concepts and methods of TDA and present different applications in chemical engineering.

[1]  R. Ho Algebraic Topology , 2022 .

[2]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Yuhei Umeda,et al.  Time Series Classification via Topological Data Analysis , 2017 .

[4]  Andrew J. Blumberg,et al.  Robust Statistics, Hypothesis Testing, and Confidence Intervals for Persistent Homology on Metric Measure Spaces , 2012, Found. Comput. Math..

[5]  Yasuaki Hiraoka,et al.  Persistent homology analysis of craze formation. , 2017, Physical review. E.

[6]  Marian Gidea,et al.  Topological Data Analysis of Financial Time Series: Landscapes of Crashes , 2017, 1703.04385.

[7]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[8]  Konstantin Mischaikow,et al.  Morse Theory for Filtrations and Efficient Computation of Persistent Homology , 2013, Discret. Comput. Geom..

[9]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  Emerson G. Escolar,et al.  Persistent homology and many-body atomic structure for medium-range order in the glass , 2015, Nanotechnology.

[11]  Mariette Yvinec,et al.  The Gudhi Library: Simplicial Complexes and Persistent Homology , 2014, ICMS.

[12]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[13]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[14]  H. Poincaré,et al.  On Analysis Situs , 2010 .

[15]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2007, Discret. Comput. Geom..

[16]  Bung-Nyun Kim,et al.  Persistent Brain Network Homology From the Perspective of Dendrogram , 2012, IEEE Transactions on Medical Imaging.

[17]  Konstantin Mischaikow,et al.  Cubical homology and the topological classification of 2D and 3D imagery , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[18]  L. Wasserman Topological Data Analysis , 2016, 1609.08227.

[19]  Jose A. Perea Topological Time Series Analysis , 2018, Notices of the American Mathematical Society.

[20]  Konstantin Mischaikow,et al.  Analysis of Kolmogorov flow and Rayleigh–Bénard convection using persistent homology , 2015, 1505.06168.

[21]  P. Alexandroff,et al.  Über den allgemeinen Dimensionsbegriff und seine Beziehungen zur elementaren geometrischen Anschauung , 1928 .

[22]  George W. Fitzmaurice,et al.  Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing , 2017, CHI.

[23]  Hernando Ombao,et al.  Topological Data Analysis of Single-Trial Electroencephalographic Signals. , 2018, The annals of applied statistics.

[24]  Berend Smit,et al.  High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites , 2018, Journal of chemical theory and computation.

[25]  Yasuaki Hiraoka,et al.  Persistent Homology and Materials Informatics , 2018 .

[26]  D. M. Kan,et al.  ABSTRACT HOMOTOPY. , 1955, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Peter Bubenik,et al.  Statistical topology using persistence landscapes , 2012, ArXiv.

[28]  Josef Spidlen,et al.  FlowRepository: A resource of annotated flow cytometry datasets associated with peer‐reviewed publications , 2012, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[29]  James A. Dumesic,et al.  Universal kinetic solvent effects in acid-catalyzed reactions of biomass-derived oxygenates , 2018 .

[30]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[31]  Henry Adams,et al.  Persistence Images: A Stable Vector Representation of Persistent Homology , 2015, J. Mach. Learn. Res..

[32]  Robert Ghrist,et al.  Elementary Applied Topology , 2014 .

[33]  G. Carlsson,et al.  Statistical topology via Morse theory, persistence and nonparametric estimation , 2009, 0908.3668.

[34]  Mikael Vejdemo-Johansson,et al.  javaPlex: A Research Software Package for Persistent (Co)Homology , 2014, ICMS.

[35]  Herbert Edelsbrunner,et al.  Topological Persistence and Simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  Jose A. Perea,et al.  Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis , 2013, Found. Comput. Math..

[37]  Ulrich Bauer,et al.  A stable multi-scale kernel for topological machine learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  James R. Munkres,et al.  Elements of algebraic topology , 1984 .

[39]  Matthew Berger,et al.  On Time-Series Topological Data Analysis: New Data and Opportunities , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Steve Oudot,et al.  The Structure and Stability of Persistence Modules , 2012, Springer Briefs in Mathematics.

[41]  S. Sheather Density Estimation , 2004 .

[42]  V. Zavala,et al.  Fast predictions of liquid-phase acid-catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks† , 2020, Chemical science.

[43]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[44]  Ingrid Hotz,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Computation of 3D Morse-Smale Complexes and Persistent Homology using Discrete Morse Theory , 2022 .

[45]  S. Mukherjee,et al.  Topological Consistency via Kernel Estimation , 2014, 1407.5272.

[46]  Brittany Terese Fasy,et al.  Introduction to the R package TDA , 2014, ArXiv.

[47]  Moo K. Chung,et al.  Discriminative persistent homology of brain networks , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[48]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[49]  Firas A. Khasawneh,et al.  Chatter Classification in Turning Using Machine Learning and Topological Data Analysis , 2018, IFAC-PapersOnLine.

[50]  Tamal K. Dey,et al.  Optimal Homologous Cycles, Total Unimodularity, and Linear Programming , 2011, SIAM J. Comput..

[51]  Yankai Cao,et al.  Machine Learning Algorithms for Liquid Crystal-Based Sensors. , 2018, ACS sensors.

[52]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[53]  R. J. Wilson Analysis situs , 1985 .

[54]  Andrew Stein,et al.  Analysis of blood vessel topology by cubical homology , 2002, Proceedings. International Conference on Image Processing.

[55]  Ulrich Bauer,et al.  Induced Matchings of Barcodes and the Algebraic Stability of Persistence , 2013, SoCG.

[56]  Heather A. Harrington,et al.  Persistent homology of time-dependent functional networks constructed from coupled time series. , 2016, Chaos.

[57]  R. Cook,et al.  Concepts and Applications of Finite Element Analysis , 1974 .

[58]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[59]  Dennis van Hoof,et al.  Simultaneous flow cytometric analysis of IFN‐γ and CD4 mRNA and protein expression kinetics in human peripheral blood mononuclear cells during activation , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[60]  Leonidas J. Guibas,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm250 Structural bioinformatics Persistent voids: a new structural metric for membrane fusion , 2022 .

[61]  Afra Zomorodian,et al.  Computing Persistent Homology , 2005, Discret. Comput. Geom..

[62]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[63]  Leonidas J. Guibas,et al.  Persistence Barcodes for Shapes , 2005, Int. J. Shape Model..

[64]  Jose A. Perea,et al.  SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data , 2015, BMC Bioinformatics.

[65]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[66]  Leonidas J. Guibas,et al.  Proximity of persistence modules and their diagrams , 2009, SCG '09.

[67]  Ippei Obayashi,et al.  Volume Optimal Cycle: Tightest representative cycle of a generator on persistent homology , 2017, SIAM J. Appl. Algebra Geom..

[68]  T. Rothenberg Identification in Parametric Models , 1971 .

[69]  Yiying Tong,et al.  Persistent homology for the quantitative prediction of fullerene stability , 2014, J. Comput. Chem..

[70]  Kelin Xia,et al.  Persistent homology analysis of protein structure, flexibility, and folding , 2014, International journal for numerical methods in biomedical engineering.

[71]  Rahul R. Shah,et al.  Principles for Measurement of Chemical Exposure Based on Recognition-Driven Anchoring Transitions in Liquid Crystals , 2001, Science.

[72]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[73]  Katharine Turner Topological Data Analysis , 2017 .

[74]  Pawel Dlotko,et al.  A persistence landscapes toolbox for topological statistics , 2014, J. Symb. Comput..

[75]  Victor M. Zavala,et al.  Convolutional Network Analysis of Optical Micrographs for Liquid Crystal Sensors , 2020, The Journal of Physical Chemistry C.