Geometrical and topological approaches to Big Data

Abstract Modern data science uses topological methods to find the structural features of data sets before further supervised or unsupervised analysis. Geometry and topology are very natural tools for analysing massive amounts of data since geometry can be regarded as the study of distance functions. Mathematical formalism, which has been developed for incorporating geometric and topological techniques, deals with point cloud data sets, i.e. finite sets of points. It then adapts tools from the various branches of geometry and topology for the study of point cloud data sets. The point clouds are finite samples taken from a geometric object, perhaps with noise. Topology provides a formal language for qualitative mathematics, whereas geometry is mainly quantitative. Thus, in topology, we study the relationships of proximity or nearness, without using distances. A map between topological spaces is called continuous if it preserves the nearness structures. Geometrical and topological methods are tools allowing us to analyse highly complex data. These methods create a summary or compressed representation of all of the data features to help to rapidly uncover particular patterns and relationships in data. The idea of constructing summaries of entire domains of attributes involves understanding the relationship between topological and geometric objects constructed from data using various features. A common thread in various approaches for noise removal, model reduction, feasibility reconstruction, and blind source separation, is to replace the original data with a lower dimensional approximate representation obtained via a matrix or multi-directional array factorization or decomposition. Besides those transformations, a significant challenge of feature summarization or subset selection methods for Big Data will be considered by focusing on scalable feature selection. Lower dimensional approximate representation is used for Big Data visualization. The cross-field between topology and Big Data will bring huge opportunities, as well as challenges, to Big Data communities. This survey aims at bringing together state-of-the-art research results on geometrical and topological methods for Big Data.

[1]  Vijay S. Pande,et al.  Persistent Topology and Metastable State in Conformational Dynamics , 2013, PloS one.

[2]  William S. Massey,et al.  Algebraic Topology: An Introduction , 1977 .

[3]  Declan Butler,et al.  A world where everyone has a robot: why 2040 could blow your mind , 2016, Nature.

[4]  László Lovász,et al.  Large Networks and Graph Limits , 2012, Colloquium Publications.

[5]  Alexander Russell,et al.  Computational topology: ambient isotopic approximation of 2-manifolds , 2003, Theor. Comput. Sci..

[6]  Afra Zomorodian,et al.  Localized Homology , 2007, Shape Modeling International.

[7]  Herbert Edelsbrunner,et al.  Computing Robustness and Persistence for Images , 2010, IEEE Transactions on Visualization and Computer Graphics.

[8]  Yunqian Ma,et al.  Manifold Learning Theory and Applications , 2011 .

[9]  Pak Chung Wong,et al.  Expanding the Frontiers of Visual Analytics and Visualization , 2012, Springer London.

[10]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[11]  P. Y. Lum,et al.  Extracting insights from the shape of complex data using topology , 2013, Scientific Reports.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Pieter J. Mosterman,et al.  Industry 4.0 as a Cyber-Physical System study , 2016, Software & Systems Modeling.

[14]  Valerio Pascucci,et al.  Branching and Circular Features in High Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[16]  Steve Oudot,et al.  Persistence Theory - From Quiver Representations to Data Analysis , 2015, Mathematical surveys and monographs.

[17]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[18]  Zhongfei Zhang,et al.  Visual search reranking with RElevant Local Discriminant Analysis , 2016, Neurocomputing.

[19]  Herbert Edelsbrunner,et al.  Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[20]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[21]  Laurence T. Yang,et al.  Big Data - Algorithms, Analytics, and Applications , 2015 .

[22]  Konstantin Mischaikow,et al.  Discrete Morse Theoretic Algorithms for Computing Homology of Complexes and Maps , 2014, Found. Comput. Math..

[23]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[24]  Patrizio Frosini,et al.  Persistent Betti Numbers for a Noise Tolerant Shape-Based Approach to Image Retrieval , 2011, CAIP.

[25]  A. Fine Recent trends. , 2003, Managed care quarterly.

[26]  Václav Snásel,et al.  Soft identification of experts in DBLP using FCA and fuzzy rules , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[27]  Francisco J. Samaniego,et al.  System Signatures and Their Applications in Engineering Reliability , 2007 .

[28]  Mason A. Porter,et al.  A roadmap for the computation of persistent homology , 2015, EPJ Data Science.

[29]  Robert Ghrist,et al.  Elementary Applied Topology , 2014 .

[30]  R. Ho Algebraic Topology , 2022 .

[31]  Lars Elden,et al.  Matrix methods in data mining and pattern recognition , 2007, Fundamentals of algorithms.

[32]  Yazid M. Sharaiha,et al.  Binary digital image processing - a discrete approach , 1999 .

[33]  Yu. I. Manin Mathematics and Physics , 2013 .

[34]  Jesse Freeman,et al.  in Morse theory, , 1999 .

[35]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[36]  A. Zomorodian Advances in Applied and Computational Topology , 2012 .

[37]  Suman K. Mitra,et al.  On some variants of locality preserving projection , 2016, Neurocomputing.

[38]  Dimitrios Gunopulos,et al.  Non-linear dimensionality reduction techniques for classification and visualization , 2002, KDD.

[39]  Christian Diedrich,et al.  Cyber-physical systems alter automation architectures , 2014, Annu. Rev. Control..

[40]  S. Wylie,et al.  Homology Theory: HOMOLOGY THEORY OF POLYHEDRA , 1960 .

[41]  Reyer Zwiggelaar,et al.  Open Problems in Spectral Dimensionality Reduction , 2014, SpringerBriefs in Computer Science.

[42]  Tamal K. Dey,et al.  Reeb Graphs: Approximation and Persistence , 2011, SoCG '11.

[43]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[44]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[45]  Kelin Xia,et al.  Persistent homology analysis of protein structure, flexibility, and folding , 2014, International journal for numerical methods in biomedical engineering.

[46]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[47]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[48]  Daniel J. Lingenfelter,et al.  Efficient Disk Drive Performance Model for Realistic Workloads , 2014, IEEE Transactions on Magnetics.

[49]  Hubert Mara,et al.  Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures , 2012, IEEE Transactions on Visualization and Computer Graphics.

[50]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[51]  Alexander M. Bronstein,et al.  Recent Trends, Applications, and Perspectives in 3D Shape Similarity Assessment , 2016, Comput. Graph. Forum.

[52]  坂上 貴之 書評 Computational Homology , 2005 .

[53]  Xinbo Gao,et al.  A novel dimensionality reduction method with discriminative generalized eigen-decomposition , 2016, Neurocomputing.

[54]  L. Nicolaescu An Invitation to Morse Theory , 2007 .

[55]  Ali A. Ghorbani,et al.  A Survey of Visualization Systems for Network Security , 2012, IEEE Transactions on Visualization and Computer Graphics.

[56]  In-Hee Park,et al.  Dynamic ligand-induced-fit simulation via enhanced conformational samplings and ensemble dockings: a survivin example. , 2010, The journal of physical chemistry. B.

[57]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[58]  Guo-Wei Wei,et al.  Object-oriented persistent homology , 2016, J. Comput. Phys..

[59]  Peter Fox,et al.  Changing the Equation on Scientific Data Visualization , 2011, Science.

[60]  Peter Bubenik,et al.  A statistical approach to persistent homology , 2006, math/0607634.

[61]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[62]  Masahiro Mizuta,et al.  Dimension Reduction Methods , 2012 .

[63]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[64]  Tom Halverson,et al.  Topological Data Analysis of Biological Aggregation Models , 2014, PloS one.

[65]  Earl F. Glynn,et al.  Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock , 2008, PloS one.

[66]  P. Dlotko,et al.  The Efficiency of a Homology Algorithm based on Discrete Morse Theory and Coreductions , 2010 .

[67]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[68]  Qiang Liu,et al.  Enabling cyber-physical systems with machine-to-machine technologies , 2013, Int. J. Ad Hoc Ubiquitous Comput..

[69]  B. Jack Copeland Colossus: its origins and originators , 2004, IEEE Annals of the History of Computing.

[70]  Mikael Vejdemo-Johansson,et al.  javaPlex: A Research Software Package for Persistent (Co)Homology , 2014, ICMS.

[71]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[72]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[73]  Danijela Horak,et al.  Persistent homology of complex networks , 2008, 0811.2203.

[74]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[75]  Václav Snásel,et al.  Pattern Discovery for High-Dimensional Binary Datasets , 2007, ICONIP.

[76]  Abubakr Muhammad,et al.  Blind Swarms for Coverage in 2-D , 2005, Robotics: Science and Systems.

[77]  Václav Snásel,et al.  Evolution of Author's Profiles Based on Analysis of DBLP Data , 2012, ADBIS Workshops.

[78]  Alexander Tropsha,et al.  A topological characterization of protein structure , 2007 .

[79]  Valerio Pascucci,et al.  Topological and Statistical Methods for Complex Data, Tackling Large-Scale, High-Dimensional, and Multivariate Data Spaces , 2015, Mathematics and Visualization.

[80]  Mariette Yvinec,et al.  The Gudhi Library: Simplicial Complexes and Persistent Homology , 2014, ICMS.

[81]  Daqiang Zhang,et al.  Context-aware vehicular cyber-physical systems with cloud support: architecture, challenges, and solutions , 2014, IEEE Communications Magazine.

[82]  Afra Zomorodian,et al.  Computational topology , 2010 .

[83]  Jin Li,et al.  Digital provenance: Enabling secure data forensics in cloud computing , 2014, Future Gener. Comput. Syst..

[84]  Claudia Landi,et al.  A Mayer–Vietoris Formula for Persistent Homology with an Application to Shape Recognition in the Presence of Occlusions , 2011, Found. Comput. Math..

[85]  X. Liu,et al.  A fast algorithm for constructing topological structure in large data , 2012 .

[86]  Xiaojin Zhu,et al.  Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing , 2013, IJCAI.

[87]  Parthasarathy Ranganathan,et al.  From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.

[88]  Li Chen,et al.  Digital and Discrete Geometry: Theory and Algorithms , 2014 .

[89]  David Cohen-Steiner,et al.  Computing geometry-aware handle and tunnel loops in 3D models , 2008, ACM Trans. Graph..

[90]  Vin de Silva,et al.  On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[91]  Daqiang Zhang,et al.  Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination , 2016, Comput. Networks.

[92]  Afra J. Zomorodian,et al.  Topology for Computing (Cambridge Monographs on Applied and Computational Mathematics) , 2005 .

[93]  James R. Munkres,et al.  Topology; a first course , 1974 .

[94]  K. Schwab The Fourth Industrial Revolution , 2013 .

[95]  Enrico Bertini,et al.  Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[96]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[97]  Enzo Morosini Frazzon,et al.  Towards Socio-Cyber-Physical Systems in Production Networks , 2013 .

[98]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[99]  M. Lesk How Much Information Is There In the World , 2014 .

[100]  Chao Chen,et al.  Efficient Computation of Persistent Homology for Cubical Data , 2012 .

[101]  Patrizio Frosini,et al.  Size theory as a topological tool for computer vision , 1999 .

[102]  M. Morse Relations between the critical points of a real function of $n$ independent variables , 1925 .

[103]  P. K. Suetin,et al.  Linear Algebra and Geometry , 1989 .

[104]  Sasho Kalajdzievski,et al.  An Illustrated Introduction to Topology and Homotopy , 2015 .

[105]  Kevin Leahy,et al.  An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities , 2015, Journal of Big Data.

[106]  Matemática A¹ homotopy theory , 2010 .

[107]  F. Coolen,et al.  Generalizing the signature to systems with multiple types of components , 2013, SOCO 2013.

[108]  Daniel Ranc,et al.  An Integrative Modeling of BigData Processing , 2015, Int. J. Comput. Sci. Appl..

[109]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[110]  Daniela Giorgi,et al.  Reeb graphs for shape analysis and applications , 2008, Theor. Comput. Sci..

[111]  L. Guibas,et al.  Topological methods for exploring low-density states in biomolecular folding pathways. , 2008, The Journal of chemical physics.

[112]  Dan Wang,et al.  Sublinear Algorithms for Big Data Applications , 2015, SpringerBriefs in Computer Science.

[113]  Fatos Xhafa,et al.  Semantics, intelligent processing and services for big data , 2014, Future Gener. Comput. Syst..