Search Space Reduction Technique for Distributed Multiple Sequence Alignment

To take advantage of the various High Performance Computer (HPC) architectures for multithreaded and distributed computing, this paper parallelizes the dynamic programming algorithm for Multiple Sequence Alignment (MSA). A novel definition of a hyper-diagonal through a tensor space is used to reduce the search space. Experiments demonstrate that scoring less than 1% of the search space produces the same optimal results as scoring the full search space. The alignment scores are often better than other heuristic methods and are capable of aligning more divergent sequences.

[1]  Simon Easteal,et al.  Mind the gaps: evidence of bias in estimates of multiple sequence alignments. , 2007, Molecular biology and evolution.

[2]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Hossam ElGindy,et al.  Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[4]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[5]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[6]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[7]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[8]  Ernst Althaus,et al.  Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics , 2002, ECCB.

[9]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[10]  Jens Stoye,et al.  DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment , 1997, Comput. Appl. Biosci..

[11]  Knut Reinert,et al.  The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[12]  Jens Stoye,et al.  An iterative method for faster sum-of-pairs multiple sequence alignment , 2000, Bioinform..

[13]  Vitali Sintchenko,et al.  Dynamic Programming Algorithms for Discovery of Antibiotic Resistance in Microbial Genomes , 2009 .

[14]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[15]  Ernst Althaus,et al.  A branch-and-cut algorithm for multiple sequence alignment , 2006, Math. Program..

[16]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.