Simple and Fast Alignment of Metabolic Pathways by Exploiting Local Diversity

MOTIVATION An important tool for analyzing biological networks is the ability to perform homology searches, i.e. given a pattern network one would like to be able to search for occurrences of similar (sub)networks within a set of host networks. In the context of metabolic pathways, Pinter et al. [Bioinformatics, 2005] proposed to solve this computationally hard problem by restricting it to the case where both the pattern and host networks are trees. This restriction, however, severely limits the applicability of their algorithm. RESULTS We propose a very fast and simple algorithm for the alignment of metabolic pathways that does not restrict the topology of the host or pattern network in any way; instead, our algorithm exploits a natural property of metabolic networks that we call 'local diversity property'. Experiments on a test bed of metabolic pathways from the BioCyc database indicate that our algorithm is much faster than the restricted algorithm of Pinter et al.-the metabolic pathways of two organisms can be aligned in mere seconds-and yet has a wider range of applicability and yields new biological insights. Our ideas can likely be extended to work for the alignment of various types of biological networks other than metabolic pathways. AVAILABILITY Our algorithm has been implemented in C++ as a user-friendly metabolic pathway alignment tool called METAPAT. The tool runs under Linux or Windows and can be downloaded at http://theinf1.informatik.uni-jena.de/metapat/

[1]  Andrzej Lingas,et al.  Faster Algorithms for Subgraph Isomorphism of \sl k -Connected Partial \sl k -Trees , 2000, Algorithmica.

[2]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[3]  Andrzej Lingas,et al.  Faster Algorithms for Subgraph Isomorphism of k-Connected Partial k-Trees , 1996, ESA.

[4]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[5]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[7]  Ron Y. Pinter,et al.  Approximate labelled subtree homeomorphism , 2004, J. Discrete Algorithms.

[8]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[9]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[10]  Julian Smart,et al.  Cross-Platform GUI Programming with wxWidgets , 2005 .

[11]  Hideo Matsuda,et al.  A Multiple Alignment Algorithm for Metabolic Pathway Analysis Using Enzyme Hierarchy , 2000, ISMB.

[12]  Mohammad Taghi Hajiaghayi,et al.  Subgraph Isomorphism, log-Bounded Fragmentation and Graphs of (Locally) Bounded Treewidth , 2002, MFCS.

[13]  Robin Thomas,et al.  On the complexity of finding iso- and other morphisms for partial k-trees , 1992, Discret. Math..

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[16]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .