Evolving Consensus Sequence for Multiple Sequence Alignment with a Genetic Algorithm

In this paper we present an approach that evolves the consensus sequence [25] for multiple sequence alignment (MSA) with genetic algorithm (GA). We have developed an encoding scheme such that the number of generations needed to find the optimal solution is approximately the same regardless the number of sequences. Instead it only depends on the length of the template and similarity between sequences. The objective function gives a sum-of-pairs (SP) score as the fitness values. We conducted some preliminary studies and compared our approach with the commonly used heuristic alignment program Clustal W. Results have shown that the GA can indeed scale and perform well.

[1]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[2]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  N. Gostling,et al.  From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design , 2002, Heredity.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  Fred R. McMorris,et al.  The computation of consensus patterns in DNA sequences , 1993 .

[7]  Tao Jiang,et al.  A More Efficient Approximation Scheme for Tree Alignment , 2000, SIAM J. Comput..

[8]  Barry G. Hall,et al.  Phylogenetic Trees Made Easy: A How-To Manual for Molecular Biologists , 2001 .

[9]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[10]  Jens Stoye,et al.  Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment , 1997 .

[11]  M. Laubichler Review of: Carroll, Sean B., Jennifer K. Grenier and Scott D. Weatherbee: From DNA to diversity : molecular genetics and the evolution of animal design. Malden, Mass [u.a.]: Blackwell Science 2001 , 2003 .

[12]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  Christopher W. V. Hogue,et al.  NBLAST: a cluster variant of BLAST for NxN comparisons , 2002, BMC Bioinformatics.

[15]  Gary B. Fogel,et al.  A Clustal alignment improver using evolutionary algorithms , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[16]  Peter Adams,et al.  A simulated annealing algorithm for finding consensus sequences , 2002, Bioinform..

[17]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[18]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[19]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[20]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[21]  D. Lipman,et al.  Trees, stars, and multiple biological sequence alignment , 1989 .

[22]  Dan Graur,et al.  Fundamentals of Molecular Evolution, 2nd Edition , 2000 .

[23]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[24]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[25]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[26]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[27]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .