A Graph-Based Genetic Algorithm for the Multiple Sequence Alignment Problem

We developed a new approach for the multiple sequence alignment problem based on Genetic Algorithms (GA). A new method to represent an alignment is proposed as a multidimensional oriented graph, which dramatically decreases the storage complexity. Details of the proposed GA are explained, including new structure-preserving genetic operators. A sensitivity analysis was done for adjusting running parameters of the GA. Performance of the proposed system was evaluated using a benchmark of hand-aligned sequences (Balibase). Overall, the results obtained are comparable or better to those obtained by a well-known software (Clustal). These results are very promising and suggest more efforts for further developments.

[1]  Bryant A. Julstrom,et al.  On Weight-Biased Mutation for Graph Problems , 2002, PPSN.

[2]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[3]  Liming Cai,et al.  Evolutionary computation techniques for multiple sequence alignment , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[4]  Moritoshi Yasunaga,et al.  A parallel hybrid genetic algorithm for multiple protein sequence alignment , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[5]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[6]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[7]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[10]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[11]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[12]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[13]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[14]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[15]  Gary B. Fogel,et al.  A Clustal alignment improver using evolutionary algorithms , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[16]  F. Glover Scatter search and path relinking , 1999 .

[17]  S. Altschul Gap costs for multiple sequence alignment. , 1989, Journal of theoretical biology.

[18]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[19]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[20]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[21]  Cheng-Yan Kao,et al.  Using Genetic Algorithms to Solve Multiple Sequence Alignments , 2000, GECCO.

[22]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[25]  H.S. Lopes,et al.  A distributed approach for a multiple sequence alignment algorithm using a parallel virtual machine , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[26]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[27]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[28]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[29]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[30]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[31]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[32]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[33]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[34]  V. Sundararajan,et al.  Multiple Sequence Alignment Using Parallel Genetic Algorithms , 1998, SEAL.