(Hyper)Graph Embedding and Classification via Simplicial Complexes

This paper investigates a novel graph embedding procedure based on simplicial complexes. Inherited from algebraic topology, simplicial complexes are collections of increasing-order simplices (e.g., points, lines, triangles, tetrahedrons) which can be interpreted as possibly meaningful substructures (i.e., information granules) on the top of which an embedding space can be built by means of symbolic histograms. In the embedding space, any Euclidean pattern recognition system can be used, possibly equipped with feature selection capabilities in order to select the most informative symbols. The selected symbols can be analysed by field-experts in order to extract further knowledge about the process to be modelled by the learning system, hence the proposed modelling strategy can be considered as a grey-box. The proposed embedding has been tested on thirty benchmark datasets for graph classification and, further, we propose two real-world applications, namely predicting proteins’ enzymatic function and solubility propensity starting from their 3D structure in order to give an example of the knowledge discovery phase which can be carried out starting from the proposed embedding strategy.

[1]  Sergio Barbarossa,et al.  An introduction to hypergraph signal processing , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Danielle S. Bassett,et al.  Two’s company, three (or more) is a simplex , 2016, Journal of Computational Neuroscience.

[3]  Emad Ramadan,et al.  A hypergraph model for the yeast protein complex network , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[4]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[5]  Hans-Jürgen Bandelt,et al.  Clique graphs and Helly graphs , 1991, J. Comb. Theory B.

[6]  Takuya Ueda,et al.  Cell-free translation reconstituted with purified components , 2001, Nature Biotechnology.

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  Alessandro Giuliani,et al.  Why network approach can promote a new way of thinking in biology , 2014, Front. Genet..

[9]  Lorenzo Livi,et al.  Granular computing, computational intelligence, and the analysis of non-geometric input spaces , 2016 .

[10]  Royston Goodacre,et al.  Improved Descriptors for the Quantitative Structure-Activity Relationship Modeling of Peptides and Proteins , 2018, J. Chem. Inf. Model..

[11]  Lorenzo Livi,et al.  Optimized dissimilarity space embedding for labeled graphs , 2014, Inf. Sci..

[12]  Lorenzo Livi,et al.  Graph ambiguity , 2013, Fuzzy Sets Syst..

[13]  Antonello Rizzi,et al.  Distance Matrix Pre-Caching and Distributed Computation of Internal Validation Indices in k-medoids Clustering , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  Afra Zomorodian,et al.  Fast construction of the Vietoris-Rips complex , 2010, Comput. Graph..

[17]  A. Giuliani,et al.  Protein contact networks: an emerging paradigm in chemistry. , 2013, Chemical reviews.

[18]  Lorenzo Livi,et al.  Granular modeling and computing approaches for intelligent analysis of non-geometric data , 2015, Appl. Soft Comput..

[19]  L. Wasserman Topological Data Analysis , 2016, 1609.08227.

[20]  A. Bonato,et al.  Graphs and Hypergraphs , 2022 .

[21]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[22]  Witold Pedrycz,et al.  Building the fundamentals of granular computing: A principle of justifiable granularity , 2013, Appl. Soft Comput..

[23]  Nico F A van der Vegt,et al.  Cosolvent Effects on Polymer Hydration Drive Hydrophobic Collapse. , 2018, The journal of physical chemistry. B.

[24]  Antonello Rizzi,et al.  (Hyper)graph Kernels over Simplicial Complexes , 2020, Entropy.

[25]  Guoyin Wang,et al.  Knowledge distance measure in multigranulation spaces of fuzzy equivalence relations , 2018, Inf. Sci..

[26]  Alessandro Giuliani,et al.  Modelling and Recognition of Protein Contact Networks by Multiple Kernel Learning and Dissimilarity Representations , 2020, Entropy.

[27]  H. Bandelt,et al.  Metric graph theory and geometry: a survey , 2006 .

[28]  Lorenzo Livi,et al.  The graph matching problem , 2012, Pattern Analysis and Applications.

[29]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[30]  Tsau Young Lin,et al.  Granular Computing , 2003, RSFDGrC.

[31]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[32]  Afra Zomorodian,et al.  Computing Persistent Homology , 2005, Discret. Comput. Geom..

[33]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[34]  Jeng-Shyang Pan,et al.  Kernel Learning Algorithms for Face Recognition , 2013 .

[35]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[36]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[37]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[38]  Felix Naumann,et al.  Detecting Duplicates in Complex XML Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[40]  Danijela Horak,et al.  Persistent homology of complex networks , 2008, 0811.2203.

[41]  J. Hausmann On the Vietoris-Rips complexes and a Cohomology Theory for metric spaces , 1996 .

[42]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[43]  Teresa Gonçalves,et al.  Comparison of Different Graph Distance Metrics for Semantic Text Based Classification , 2014, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[44]  Natasa Przulj,et al.  Functional geometry of protein-protein interaction networks , 2018, 1804.04428.

[45]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[46]  Alessandro Giuliani,et al.  Protein–Protein Interactions: The Structural Foundation of Life Complexity , 2017 .

[47]  Masaru Tomita,et al.  Proteins as networks: usefulness of graph theory in protein science. , 2008, Current protein & peptide science.

[48]  Sergio Barbarossa,et al.  Topological Signal Processing Over Simplicial Complexes , 2019, IEEE Transactions on Signal Processing.

[49]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[50]  Natasa Przulj,et al.  Higher‐order molecular organization as a source of biological function , 2018, Bioinform..

[51]  Horst Bunke,et al.  Bridging the Gap between Graph Edit Distance and Kernel Machines , 2007, Series in Machine Perception and Artificial Intelligence.

[52]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[53]  Antonello Rizzi,et al.  A Novel Algorithm for Online Inexact String Matching and its FPGA Implementation , 2017, Cognitive Computation.

[54]  Lorenzo Livi,et al.  A Granular Computing approach to the design of optimized graph classification systems , 2014, Soft Comput..

[55]  J. A. Rodríguez-Velázquez,et al.  Complex Networks as Hypergraphs , 2005, physics/0505137.

[56]  Antonello Rizzi,et al.  Stochastic Information Granules Extraction for Graph Embedding and Classification , 2019, IJCCI.

[57]  Antonello Rizzi,et al.  Efficient Approaches for Solving the Large-Scale k-Medoids Problem: Towards Structured Data , 2017, IJCCI.

[58]  Dong Hoon Lee,et al.  Secure Similarity Search , 2007 .

[59]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[60]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[61]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[62]  Yiyu Yao,et al.  A measurement theory view on the granularity of partitions , 2012, Inf. Sci..

[63]  Antonello Rizzi,et al.  Dissimilarity Space Representations and Automatic Feature Selection for Protein Function Prediction , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[64]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[65]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[66]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[67]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[68]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[69]  Prem Kumar Singh,et al.  Similar Vague Concepts Selection Using Their Euclidean Distance at Different Granulation , 2018, Cognitive Computation.

[70]  Sergio Barbarossa,et al.  LEARNING FROM SIGNALS DEFINED OVER SIMPLICIAL COMPLEXES , 2018, 2018 IEEE Data Science Workshop (DSW).

[71]  Hong Zhu,et al.  Survey on granularity clustering , 2015, Cognitive Neurodynamics.

[72]  Simone Scardapane,et al.  An interpretable graph-based image classifier , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[73]  Antonello Rizzi,et al.  Supervised Approaches for Protein Function Prediction by Topological Data Analysis , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[74]  Lorenzo Livi,et al.  A new Granular Computing approach for sequences representation and classification , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[75]  A. Giuliani,et al.  Granular Computing Techniques for Bioinformatics Pattern Recognition Problems in Non-metric Spaces , 2018 .

[76]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[77]  Katharine Turner Topological Data Analysis , 2017 .

[78]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[79]  Teresa Gonçalves,et al.  Using Graphs and Semantic Information to Improve Text Classifiers , 2014, PolTAL.

[80]  Lorenzo Livi,et al.  On the Problem of Modeling Structured Data with the MinSOD Representative , 2014 .

[81]  Antonello Rizzi,et al.  Automatic Classification of Graphs by Symbolic Histograms , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[82]  Antonello Rizzi,et al.  Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction , 2019, Soft Computing.

[83]  Frédéric Cazals,et al.  A note on the problem of reporting maximal cliques , 2008, Theor. Comput. Sci..

[84]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[85]  Robert Ghrist,et al.  Elementary Applied Topology , 2014 .

[86]  Alessandro Giuliani,et al.  Supervised Approaches for Function Prediction of Proteins Contact Networks from Topological Structure Information , 2017, SCIA.

[87]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[88]  Antonello Rizzi,et al.  Efficient Approaches for Solving the Large-Scale k-medoids Problem , 2017, IJCCI.

[89]  Alfredo Colosimo,et al.  Structure-Related Statistical Singularities along Protein Sequences: A Correlation Study , 2005, J. Chem. Inf. Model..

[90]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[91]  Simone Scardapane,et al.  Granular Computing Techniques for Classification and Semantic Characterization of Structured Data , 2015, Cognitive Computation.