Evolutionary computation techniques for multiple sequence alignment

Given a collection of biologically related protein or DNA sequences, the basic multiple sequence alignment problem is to determine the most biologically plausible alignment of these sequences. Under the assumption that the collection of sequences arose from some common ancestor, an alignment can be used to infer the evolutionary history among the sequences, i.e., the most likely pattern of insertions, deletions and mutations that transformed one sequence into another. The general multiple sequence alignment problem is known to be NP-hard, and hence the problem of finding the best possible multiple sequence alignment is intractable. However, this does not preclude the possibility of developing algorithms that produce near optimal multiple sequence alignments in polynomial time. We examine techniques to combine efficient algorithms for near optimal global and local multiple sequence alignment with evolutionary computation techniques to search for better near optimal sequence alignments. We describe our evolutionary computation approach to multiple sequence alignment and present preliminary simulation results on a set of 17 clusters of orthologous groups of proteins (COGs). We compare the fitness of the alignments given by the proposed techniques with the fitness of CLUSTAL W alignments given in the COG database.

[1]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[2]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[3]  Tao Jiang,et al.  Some open problems in computational molecular biology , 1999, SIGA.

[4]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[5]  D. Higgins,et al.  RAGA: RNA sequence alignment by genetic algorithm. , 1997, Nucleic acids research.

[6]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Juan Seijas,et al.  Multiple protein sequence comparison by genetic algorithms , 1998, Defense, Security, and Sensing.

[8]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  David B. Fogel,et al.  Evolutionary Computation: The Fossil Record , 1998 .

[12]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[13]  V. Sundararajan,et al.  Multiple Sequence Alignment Using Parallel Genetic Algorithms , 1998, SEAL.

[14]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[15]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[16]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[17]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[20]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.