An evolutionary progressive multiple sequence alignment

This paper proposes an evolutionary tree-base (progressive multiple sequence alignment) method using a genetic algorithm (GA) for solving multiple sequence alignment problems. In our evolutionary tree-base method, chromosomes are represented as guide trees. Two kinds of crossover are proposed for chromosomes of tree structure; subtree selection crossover and tree uniform order crossover. They can generate new chromosomes with inheriting tree structure of parents. The indirect representation of multiple alignments, namely, the guide tree representation of chromosomes, and the proper genetic operations make searching drastically efficient. Experimental results for benchmark problems from BAliBASE and the NCBI database show that the proposed method is superior to SAGA (a well-known GA-base approach, 1996), T- coffee (sensitive progressive method, 2000), MUSCLE (progressive/iterative method, 2004), MAFFT (progressive/iterative method, 2005), and ProbCons (probabilistic/consistency method, 2005) with regard to quality of solutions.

[1]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[2]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[8]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[9]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[12]  S Brunak,et al.  Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. , 1992, Nucleic acids research.

[13]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[14]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[15]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[16]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.