Color distribution can accelerate network alignment

Aligning a query network to an arbitrary large target network while ensuring provable optimality guarantee is a computationally challenging task. To ensure the confidence in the optimality of the alignment, existing methods often use an iterative randomization technique called color coding. Each iteration of the color coding technique employs dynamic programming that is exponential in the number of nodes in the query network. Here, we develop a method named ColT (Colorful Tree) that reduces the cost of this bottleneck. It particularly focuses on query networks with tree topology which is considered frequently in the literature. ColT exploits the topology of the query tree and uses the color distribution in the target network to filter unpromising alignments without compromising the confidence in the optimality. We experiment on a comprehensive set of synthetic and real data sets. ColT demonstrates supremacy over the state-of-the-art color coding algorithm, QNet with growing size of the query trees. For query trees of nine nodes in directed and undirected target networks, ColT outperforms QNet by factors of eight and fifteen, respectively. Our experiments also suggest that ColT identifies functionally similar regions in protein-protein interaction networks.

[1]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[2]  Tamer Kahveci,et al.  Topac: Alignment of gene Regulatory Networks Using Topology-Aware Coloring , 2012, J. Bioinform. Comput. Biol..

[3]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[4]  C. Francke,et al.  Reconstructing the metabolic network of a bacterium from its genome. , 2005, Trends in microbiology.

[5]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[6]  Kenji Satou,et al.  Finding conserved and non-conserved reactions using a metabolic pathway alignment algorithm. , 2006, Genome informatics. International Conference on Genome Informatics.

[7]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[10]  Bernardo A Mangiola,et al.  A Drosophila protein-interaction map centered on cell-cycle regulators , 2004, Genome Biology.

[11]  Sanjay Ranka,et al.  An Iterative Algorithm for Metabolic Network-Based Drug Target Identification , 2006, Pacific Symposium on Biocomputing.

[12]  Tamer Kahveci,et al.  SubMAP: Aligning Metabolic Pathways with Subnetwork Mappings , 2010, J. Comput. Biol..

[13]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[14]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[15]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[16]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[17]  Mam Riess Jones Color Coding , 1962, Human factors.

[18]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[19]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[20]  Steven Skiena,et al.  Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica ® , 2009 .

[21]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.