Multiple Sequence Alignment with Genetic Algorithms

The multiple sequence alignment problem is one the most common task in the analysis of sequential data, especially in bioinformatics. In this paper, we propose to use a genetic algorithm to compute a multiple sequence alignment, by optimizing a simple scoring function. Even though the idea of using genetic algorithms is not new, the presented approach differs in the representation of the multiple alignment and in the simplicity of the genetic operators. The results so far obtained are reported and discussed in this paper.

[1]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[2]  Liming Cai,et al.  Evolutionary computation techniques for multiple sequence alignment , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[3]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[4]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[6]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[7]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[8]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[9]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[10]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[11]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  Jaap Heringa,et al.  Two Strategies for Sequence Comparison: Profile-preprocessed and Secondary Structure-induced Multiple Alignment , 1999, Comput. Chem..

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Juan Seijas,et al.  Multiple protein sequence comparison by genetic algorithms , 1998, Defense, Security, and Sensing.

[18]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[19]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[20]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[21]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[22]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[24]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[25]  Jens Stoye,et al.  DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment , 1997, Comput. Appl. Biosci..

[26]  P. Briffeuil,et al.  Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability , 1997, Comput. Appl. Biosci..

[27]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.

[28]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[29]  V. Sundararajan,et al.  Multiple Sequence Alignment Using Parallel Genetic Algorithms , 1998, SEAL.

[30]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.