Indexing a protein-protein interaction network expedites network alignment

BackgroundNetwork query problem aligns a small query network with an arbitrarily large target network. The complexity of this problem grows exponentially with the number of nodes in the query network if confidence in the optimality of result is desired. Scaling this problem to large query and target networks remains to be a challenge.ResultsIn this article, we develop a novel index structure that dramatically reduces the cost of the network query problem. Our index structure maintains a small set of reference networks where each reference network is a small, carefully chosen subnetwork from the target network. Along with each reference, we also store all of its non-overlapping and statistically significant alignments with the target network. Given a query network, we first align the query with the reference networks. If the alignment with a reference network yields a sufficiently large score, we compute an upper-bound to the alignment score between the query and the target using the alignments of that reference and the target (which is stored in our index). If the upper-bound is large enough, we employ a second round of alignment between the query and the target by respecting the mapping found in the first alignment.Our experiments on protein-protein interaction networks demonstrate that our index achieves a significant speed-up in running time over the state-of-the-art methods such as ColT. The alignment subnetworks obtained by our method are also statistically significant. Finally, we observe that our method finds biologically and statistically significant alignments across multiple species.ConclusionsWe developed a reference network based indexing structure that accelerates network query and produces functionally and statistically significant results.

[1]  Tamer Kahveci,et al.  A scalable method for discovering significant subnetworks , 2013, BMC Systems Biology.

[2]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[3]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[4]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[5]  Dennis Shasha,et al.  NetMatch : a Cytoscape plugin for searching biological networks , 2006 .

[6]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[7]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[8]  Software Pioneers , 2002, Springer Berlin Heidelberg.

[9]  Steven Skiena,et al.  Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica ® , 2009 .

[10]  Tamer Kahveci,et al.  SubMAP: Aligning Metabolic Pathways with Subnetwork Mappings , 2010, J. Comput. Biol..

[11]  C. Francke,et al.  Reconstructing the metabolic network of a bacterium from its genome. , 2005, Trends in microbiology.

[12]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[13]  Mam Riess Jones Color Coding , 1962, Human factors.

[14]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[15]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[16]  Kenji Satou,et al.  Finding conserved and non-conserved reactions using a metabolic pathway alignment algorithm. , 2006, Genome informatics. International Conference on Genome Informatics.

[17]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[18]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[19]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[20]  Roded Sharan,et al.  Comparative analysis of protein networks , 2012, Commun. ACM.

[21]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[22]  Tamer Kahveci,et al.  RINQ: Reference-based Indexing for Network Queries , 2011, Bioinform..

[23]  Sourav S. Bhowmick,et al.  DualAligner: a dual alignment-based strategy to align protein interaction networks , 2014, Bioinform..

[24]  Dennis Shasha,et al.  SING: Subgraph search In Non-homogeneous Graphs , 2010, BMC Bioinformatics.

[25]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[26]  Bernardo A Mangiola,et al.  A Drosophila protein-interaction map centered on cell-cycle regulators , 2004, Genome Biology.

[27]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[29]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[30]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[31]  Luigi Palopoli,et al.  Asymmetric Comparison and Querying of Biological Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[33]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[34]  Olaf Wolkenhauer,et al.  Simulations of stressosome activation emphasize allosteric interactions between RsbR and RsbT , 2013, BMC Systems Biology.

[35]  Sourav S. Bhowmick,et al.  FUSE: a profit maximization approach for functional summarization of biological networks , 2012, BMC Bioinformatics.

[36]  Wayne Hayes,et al.  Optimal Network Alignment with Graphlet Degree Vectors , 2010, Cancer informatics.

[37]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[38]  Roded Sharan,et al.  Topology-Free Querying of Protein Interaction Networks , 2009, RECOMB.

[39]  Sebastian Maneth,et al.  Compressing graphs by grammars , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[40]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[41]  Tamer Kahveci,et al.  Color distribution can accelerate network alignment , 2013, BCB.

[42]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[43]  Roded Sharan,et al.  Sigma: a Set-Cover-Based Inexact Graph Matching Algorithm , 2010, J. Bioinform. Comput. Biol..

[44]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[45]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[46]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[47]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[48]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[49]  Shannon M. Bell,et al.  MIPHENO: data normalization for high throughput metabolite analysis , 2012, BMC Bioinformatics.