Classification in biological networks with hypergraphlet kernels

Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins, drugs) and edges represent relational ties between these objects (binds-to, interacts-with, regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, suffer from information loss when modeling physical systems due to their inability to accurately represent multi-object relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. We present a hypergraph-based approach for modeling biological systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs. We then introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of hypergraphlets; i.e., small hypergraphs rooted at a vertex of interest. We empirically evaluate this method on fifteen biological networks and show its potential use in a positive-unlabeled setting to estimate the interactome sizes in various species. AVAILABILITY https://github.com/jlugomar/hypergraphlet-kernels/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[2]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[3]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[4]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[5]  Martha White,et al.  Estimating the class prior and posterior from noisy positives and unlabeled data , 2016, NIPS.

[6]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[7]  Andreas Ruepp,et al.  CORUM: the comprehensive resource of mammalian protein complexes—2019 , 2018, Nucleic Acids Res..

[8]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[9]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[10]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[11]  Dorothea Wagner,et al.  Modeling Hypergraphs by Graphs with the Same Mincut Properties , 1993, Inf. Process. Lett..

[12]  Roni Khardon,et al.  Learning from interpretations: a rooted kernel for ordered hypergraphs , 2007, ICML '07.

[13]  Cheng-Yu Ma,et al.  Identification of protein complexes by integrating multiple alignment of protein interaction networks , 2017, Bioinform..

[14]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[15]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[16]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[17]  Cristian Sminchisescu,et al.  Efficient Hypergraph Clustering , 2012, AISTATS.

[18]  Steffen Klamt,et al.  Hypergraphs and Cellular Networks , 2009, PLoS Comput. Biol..

[19]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[20]  Matthias Hein,et al.  The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited , 2013, NIPS.

[21]  Martha White,et al.  Recovering True Classifier Performance in Positive-Unlabeled Learning , 2017, AAAI.

[22]  Charlotte M. Deane,et al.  What Evidence Is There for the Homology of Protein-Protein Interactions? , 2012, PLoS Comput. Biol..

[23]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[24]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[25]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[26]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[27]  Albert-László Barabási,et al.  Network-based prediction of protein interactions , 2018, Nature Communications.

[28]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[29]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[30]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[31]  Yuhao Wang,et al.  Predicting drug-target interactions using restricted Boltzmann machines , 2013, Bioinform..

[32]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[33]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[34]  Jason Cong,et al.  Random walks for circuit clustering , 1991, [1991] Proceedings Fourth Annual IEEE International ASIC Conference and Exhibit.

[35]  Edwin R. Hancock,et al.  A Hypergraph Kernel from Isomorphism Tests , 2014, 2014 22nd International Conference on Pattern Recognition.

[36]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[37]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[38]  V. Sós,et al.  Counting Graph Homomorphisms , 2006 .

[39]  Marcello Pelillo,et al.  A Game-Theoretic Approach to Hypergraph Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[41]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[43]  Jean-Philippe Vert,et al.  Supervised reconstruction of biological networks with local models , 2007, ISMB/ECCB.

[44]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[45]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[46]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[47]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[48]  Konrad Rieck,et al.  Linear-Time Computation of Similarity Measures for Sequential Data , 2008, J. Mach. Learn. Res..

[49]  Predrag Radivojac,et al.  Generalized graphlet kernels for probabilistic inference in sparse graphs , 2014, Network Science.

[50]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[51]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[52]  Martha White,et al.  Nonparametric semi-supervised learning of class proportions , 2016, ArXiv.

[53]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[54]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[55]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[56]  Joel S. Bader,et al.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps , 2007, PLoS Comput. Biol..

[57]  Koji Tsuda,et al.  Graph Classification , 2010, Managing and Mining Graph Data.

[58]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[59]  Tat-Jun Chin,et al.  Clustering with Hypergraphs: The Case for Large Hyperedges , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Cheng Soon Ong,et al.  Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[61]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[62]  Predrag Radivojac,et al.  Estimating classification accuracy in positive-unlabeled learning: characterization and correction strategies , 2018, PSB.

[63]  Robert L. Hemminger,et al.  Graph reconstruction - a survey , 1977, J. Graph Theory.

[64]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[65]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[66]  Natasa Przulj,et al.  Higher‐order molecular organization as a source of biological function , 2018, Bioinform..

[67]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[68]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[69]  P. Radivojac,et al.  Analysis of protein function and its prediction from amino acid sequence , 2011, Proteins.

[70]  G. Pólya Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen , 1937 .

[71]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[72]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[73]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[74]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[75]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[76]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[77]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[78]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[79]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.