NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity

Abstract Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability and implementation The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[2]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[3]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[4]  Robert Patro,et al.  Global network alignment using multiscale spectral signatures , 2012, Bioinform..

[5]  Hannah Currant,et al.  FFPred 3: feature-based function prediction for all Gene Ontology domains , 2016, Scientific Reports.

[6]  Tijana Milenkovic,et al.  MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation , 2015, Bioinform..

[7]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[8]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[9]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[10]  David T. Jones,et al.  Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks , 2018, bioRxiv.

[11]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[12]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[13]  Lei Meng,et al.  The post-genomic era of biological network alignment , 2015, EURASIP J. Bioinform. Syst. Biol..

[14]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Weidong Tian,et al.  GoFDR: A sequence alignment based method for predicting protein functions. , 2016, Methods.

[17]  Fengzhu Sun,et al.  NetGO: improving large-scale protein function prediction with massive network information , 2019, Nucleic acids research.

[18]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[19]  Aaron Striegel,et al.  Local versus global biological network alignment , 2015, Bioinform..

[20]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[21]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[22]  Anaïs Baudot,et al.  Random Walk With Restart on Multiplex and Heterogeneous Biological Networks , 2017, bioRxiv.

[23]  Natasa Przulj,et al.  L-GRAAL: Lagrangian graphlet-based network aligner , 2015, Bioinform..

[24]  Richard Bonneau,et al.  Towards region-specific propagation of protein functions , 2019, Bioinform..

[25]  Jari Björne,et al.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.

[26]  Maxat Kulmanov,et al.  DeepGOPlus: Improved protein function prediction from sequence , 2019 .

[27]  Mark Crovella,et al.  Functional protein representations from biological networks enable diverse cross-species inference , 2019, Nucleic acids research.

[28]  C. Orengo,et al.  Protein function prediction--the power of multiplicity. , 2009, Trends in biotechnology.

[29]  Yang Zhang,et al.  MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. , 2018, Journal of molecular biology.

[30]  Fang-Xiang Wu,et al.  Identifying protein complexes and functional modules - from static PPI networks to dynamic PPI networks , 2014, Briefings Bioinform..

[31]  Tijana Milenkovic,et al.  MAGNA: Maximizing Accuracy in Global Network Alignment , 2013, Bioinform..

[32]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[33]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[34]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[35]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[36]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[37]  Roded Sharan,et al.  To Embed or Not: Network Embedding as a Paradigm in Computational Biology , 2019, Front. Genet..

[38]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[39]  Daisuke Kihara,et al.  Computational protein function predictions. , 2016, Methods.

[40]  Natasa Przulj,et al.  Fuse: multiple network alignment via data fusion , 2014, Bioinform..

[41]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..