Multiple sequence alignment by parallel simulated annealing

We have developed simulated annealing algorithms to solve the problem of multiple sequence alignment. The algorithm was shown to give the optimal solution as confirmed by the rigorous dynamic programming algorithm for three-sequence alignment. To overcome long execution times for simulated annealing, we utilized a parallel computer. A sequential algorithm, a simple parallel algorithm and the temperature parallel algorithm were tested on a problem. The results were compared with the result obtained by a conventional tree-based algorithm where alignments were merged by two-way dynamic programming. Every annealing algorithm produced a better energy value than the conventional algorithm. The best energy value, which probably represents the optimal solution, was reached within a reasonable time by both of the parallel annealing algorithms. We consider the temperature parallel algorithm of simulated annealing to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization. The algorithm is also useful for refining multiple alignments obtained by other heuristic methods.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[3]  Masato Ishikawa,et al.  MASCOT: multiple alignment system for protein sequences based on three- way dynamic programming , 1993, Comput. Appl. Biosci..

[4]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[5]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[6]  Constantino Tsallis,et al.  Optimization by Simulated Annealing: Recent Progress , 1995 .

[7]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[8]  M. I. Kanehisa,et al.  Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries , 1982, Nucleic Acids Res..

[9]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[10]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. Barton Protein multiple sequence alignment and flexible pattern matching. , 1990, Methods in enzymology.

[12]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[13]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Shunichi Uchida Summary of the Parallel Inference Machine and its Basic Software , 1992, FGCS.

[16]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Yu Inamura,et al.  Distributed Implementation of KL1 on the Multi-PSI/V2 , 1989, ICLP.

[18]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[19]  D. Bacon,et al.  Multiple Sequence Alignment , 1986, Journal of molecular biology.