Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment

The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph. We have made substantial improvements in the time and space usage of MSA. The improvements make feasible a variety of problem instances that were not feasible previously. On some runs we achieve an order of magnitude reduction in space usage and a significant multiplicative factor speedup in running time. To explain how these improvements work, we give a much more detailed description of MSA than has been previously available. In practice, MSA rarely produces a provably optimal alignment and we explain why.

[1]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[3]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[4]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[5]  Michael R. Fellows,et al.  The Parameterized Complexity of Sequence Alignment and Consensus , 1994, CPM.

[6]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[7]  M. A. McClure,et al.  Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[8]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[9]  S. Altschul Gap costs for multiple sequence alignment. , 1989, Journal of theoretical biology.

[10]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[11]  S Subbiah,et al.  A method for multiple sequence alignment with gaps. , 1989, Journal of molecular biology.

[12]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[13]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[14]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[15]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[16]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.