Fuse: multiple network alignment via data fusion

MOTIVATION Discovering patterns in networks of protein-protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. However, the complexity of the multiple network alignment problem grows exponentially with the number of networks being aligned and designing a multiple network aligner that is both scalable and that produces biologically relevant alignments is a challenging task that has not been fully addressed. The objective of multiple network alignment is to create clusters of nodes that are evolutionarily and functionally conserved across all networks. Unfortunately, the alignment methods proposed thus far do not meet this objective as they are guided by pairwise scores that do not utilize the entire functional and evolutionary information across all networks. RESULTS To overcome this weakness, we propose Fuse, a new multiple network alignment algorithm that works in two steps. First, it computes our novel protein functional similarity scores by fusing information from wiring patterns of all aligned PPI networks and sequence similarities between their proteins. This is in contrast with the previous tools that are all based on protein similarities in pairs of networks being aligned. Our comprehensive new protein similarity scores are computed by Non-negative Matrix Tri-Factorization (NMTF) method that predicts associations between proteins whose homology (from sequences) and functioning similarity (from wiring patterns) are supported by all networks. Using the five largest and most complete PPI networks from BioGRID, we show that NMTF predicts a large number protein pairs that are biologically consistent. Second, to identify clusters of aligned proteins over all networks, Fuse uses our novel maximum weight k-partite matching approximation algorithm. We compare Fuse with the state of the art multiple network aligners and show that (i) by using only sequence alignment scores, Fuse already outperforms other aligners and produces a larger number of biologically consistent clusters that cover all aligned PPI networks and (ii) using both sequence alignments and topological NMTF-predicted scores leads to the best multiple network alignments thus far. AVAILABILITY AND IMPLEMENTATION Our dataset and software are freely available from the web site: http://bio-nets.doc.ic.ac.uk/Fuse/ CONTACT natasha@imperial.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[2]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[3]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[4]  Blaž Zupan,et al.  Matrix factorization-based data fusion for drug-induced liver injury prediction , 2014 .

[5]  Marinka Zitnik,et al.  Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold , 2013, Pacific Symposium on Biocomputing.

[6]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[7]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[8]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[9]  Nataša Pržulj,et al.  Protein‐protein interactions: Making sense of networks via graph‐theoretic modeling , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[10]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[13]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[14]  Jiping Liu,et al.  Approximation Algorithms for Some Graph Partitioning Problems , 2000, J. Graph Algorithms Appl..

[15]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[16]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[17]  Jugal K. Kalita,et al.  A comparison of algorithms for the pairwise alignment of biological networks , 2014, Bioinform..

[18]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[19]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Cesim Erten,et al.  BEAMS: backbone extraction and merge strategy for the global many-to-many alignment of multiple PPI networks , 2014, Bioinform..

[21]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[22]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[23]  Natasa Przulj,et al.  Integration of molecular network data reconstructs Gene Ontology , 2014, Bioinform..

[24]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[25]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[26]  Roded Sharan,et al.  NetworkBLAST: comparative analysis of protein networks , 2008 .

[27]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[29]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[30]  Sourav Bandyopadhyay,et al.  Systematic identification of functional orthologs based on protein network comparison. , 2006, Genome research.

[31]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[32]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[33]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[34]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[35]  B. Ason,et al.  A high-throughput assay for Tn5 Tnp-induced DNA cleavage. , 2004, Nucleic acids research.

[36]  B. Berger,et al.  Herpesviral Protein Networks and Their Interaction with the Human Proteome , 2006, Science.

[37]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[38]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[39]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[40]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[41]  Byung-Jun Yoon,et al.  Accurate multiple network alignment through context-sensitive random walk , 2015, BMC Systems Biology.

[42]  Alexandra Paillusson,et al.  A GFP-based reporter system to monitor nonsense-mediated mRNA decay , 2005, Nucleic acids research.

[43]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[44]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[45]  L. Lovász Matching Theory (North-Holland mathematics studies) , 1986 .

[46]  Alex Radu,et al.  Node Handprinting: A Scalable and Accurate Algorithm for Aligning Multiple Biological Networks , 2015, J. Comput. Biol..

[47]  Byung-Jun Yoon,et al.  SMETANA: Accurate and Scalable Algorithm for Probabilistic Alignment of Large-Scale Biological Networks , 2013, PloS one.

[48]  Kinam Kim,et al.  DNA hydrogel-based supercapacitors operating in physiological fluids , 2013, Scientific Reports.

[49]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[50]  B. Zupan,et al.  Discovering disease-disease associations by fusing systems-level molecular data , 2013, Scientific Reports.

[51]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[52]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[53]  Zachary A. Szpiech,et al.  High-resolution network biology: connecting sequence with function , 2013, Nature Reviews Genetics.

[54]  Knut Reinert,et al.  NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks , 2014, Bioinform..