Fair evaluation of global network aligners

BackgroundAnalogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods’ NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better.ResultsOur findings are as follows. MI-GRAAL’s NCF is superior to GHOST’s NCF, while the performance of the methods’ ASs is data-dependent. Thus, for data on which GHOST’s AS is superior to MI-GRAAL’s AS, the combination of MI-GRAAL’s NCF and GHOST’s AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity.ConclusionTaken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.

[1]  Tijana Milenkovic,et al.  Dynamic networks reveal key players in aging , 2013, BCB.

[2]  Tin Wee Tan,et al.  In silico grouping of peptide/HLA class I complexes using structural interaction characteristics , 2007, Bioinform..

[3]  Jugal K. Kalita,et al.  A comparison of algorithms for the pairwise alignment of biological networks , 2014, Bioinform..

[4]  Natasa Przulj,et al.  Integrative network alignment reveals large regions of global network similarity in yeast and human , 2011, Bioinform..

[5]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[6]  M. Cannataro,et al.  AlignNemo: A Local Network Alignment Method to Integrate Homology and Topology , 2012, PloS one.

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[9]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[10]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Local Network Alignment , 2009, J. Comput. Biol..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Xin Guo,et al.  Domain-oriented edge-based alignment of protein interaction networks , 2009, Bioinform..

[13]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[14]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Pietro Hiram Guzzi,et al.  Improving the Robustness of Local Network Alignment: Design and Extensive Assessmentof a Markov Clustering-Based Approach , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[17]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[18]  Bonnie Berger,et al.  Global Alignment of Multiple Protein Interaction Networks , 2008, Pacific Symposium on Biocomputing.

[19]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[20]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[21]  Roded Sharan,et al.  Global alignment of protein-protein interaction networks. , 2013, Methods in molecular biology.

[22]  Tijana Milenkovic,et al.  Graphlet-based edge clustering reveals pathogen-interacting proteins , 2012, Bioinform..

[23]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[24]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[25]  Han Zhao,et al.  Global Network Alignment in the Context of Aging , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[27]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[28]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[29]  Robert Patro,et al.  Global network alignment using multiscale spectral signatures , 2012, Bioinform..

[30]  T. Milenković,et al.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data , 2010, Journal of The Royal Society Interface.

[31]  Wayne Hayes,et al.  Optimal Network Alignment with Graphlet Degree Vectors , 2010, Cancer informatics.

[32]  Jie Tang,et al.  Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE , 2014, WABI.

[33]  Natasa Przulj,et al.  GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity , 2014, Bioinform..

[34]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Network Alignment , 2008, RECOMB.

[35]  Roland A. Pache,et al.  A Novel Framework for the Comparative Analysis of Biological Networks , 2012, PloS one.

[36]  Behnam Neyshabur,et al.  NETAL: a new graph-based method for global alignment of protein-protein interaction networks , 2013, Bioinform..

[37]  Stephen E. Stein,et al.  The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation , 2009, BMC Bioinformatics.

[38]  Ryan W. Solava,et al.  Revealing Missing Parts of the Interactome via Link Prediction , 2014, PloS one.

[39]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[40]  A. Bonato,et al.  Dominating Biological Networks , 2011, PloS one.

[41]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[42]  Jaap Heringa,et al.  Lagrangian Relaxation Applied to Sparse Global Network Alignment , 2011, PRIB.

[43]  Ahmet Emre Aladag,et al.  SPINAL: scalable protein interaction network alignment , 2013, Bioinform..

[44]  Tijana Milenkovic,et al.  GREAT: GRaphlet Edge-based network AlignmenT , 2014, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[45]  Arnaud Céol,et al.  NetAligner—a network alignment server to compare complexes, pathways and whole interactomes , 2012, Nucleic Acids Res..

[46]  Meng Xu,et al.  NetAlign: a web-based tool for comparison of protein interaction networks , 2006, Bioinform..

[47]  Gunnar W. Klau,et al.  A new graph-based method for pairwise global network alignment , 2009, BMC Bioinformatics.

[48]  Tijana Milenkovic,et al.  MAGNA: Maximizing Accuracy in Global Network Alignment , 2013, Bioinform..

[49]  Johannes Berg,et al.  Cross-species analysis of biological networks by Bayesian alignment. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[52]  Tamer Kahveci,et al.  Color distribution can accelerate network alignment , 2013, BCB.