A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition

Multiple sequence alignment (MSA) is a fundamental and key step for implementing other tasks in bioinformatics, such as phylogenetic analyses, identification of conserved motifs and domains, structure prediction, etc. Despite the fact that there are many methods to implement MSA, biologically perfect alignment approaches are not found hitherto. This paper proposes a novel idea to perform MSA, where MSA is treated as a multiobjective optimization problem. A famous multiobjective evolutionary algorithm framework based on decomposition is applied for solving MSA, named MOMSA. In the MOMSA algorithm, we develop a new population initialization method and a novel mutation operator. We compare the performance of MOMSA with several alignment methods based on evolutionary algorithms, including VDGA, GAPAM, and IMSA, and also with state-of-the-art progressive alignment approaches, such as MSAprobs, Probalign, MAFFT, Procons, Clustal omega, T-Coffee, Kalign2, MUSCLE, FSA, Dialign, PRANK, and CLUSTALW. These alignment algorithms are tested on benchmark datasets BAliBASE 2.0 and BAliBASE 3.0. Experimental results show that MOMSA can obtain the significantly better alignments than VDGA, GAPAM on the most of test cases by statistical analyses, produce better alignments than IMSA in terms of TC scores, and also indicate that MOMSA is comparable with the leading progressive alignment approaches in terms of quality of alignments.

[1]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[2]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[3]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[4]  Ruhul A. Sarker,et al.  Progressive Alignment Method Using Genetic Algorithm for Multiple Sequence Alignment , 2012, IEEE Transactions on Evolutionary Computation.

[5]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[6]  C. Gondro,et al.  A simple genetic algorithm for multiple sequence alignment. , 2007, Genetics and molecular research : GMR.

[7]  Ruhul A. Sarker,et al.  Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment , 2011, BMC Bioinformatics.

[8]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[9]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[10]  Yi Pan,et al.  An Improved Scoring Method for Protein Residue Conservation and Multiple Sequence Alignment , 2011, IEEE Transactions on NanoBioscience.

[11]  Robert C. Edgar,et al.  Quality measures for protein alignment benchmarks , 2010, Nucleic acids research.

[12]  Wouter Boomsma,et al.  Multiple Sequence Alignment Using SAGA: Investigating the Effects of Operator Scheduling, Population Seeding, and Crossover Operators , 2004, EvoWorkshops.

[13]  Desmond G. Higgins,et al.  Making automated multiple alignments of very large numbers of protein sequences , 2013, Bioinform..

[14]  Zne-Jung Lee,et al.  Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment , 2008, Appl. Soft Comput..

[15]  E. Sonnhammer,et al.  Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features , 2008, Nucleic acids research.

[16]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[17]  Yongchao Liu,et al.  MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities , 2010, Bioinform..

[18]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[19]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[20]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[21]  Javid Taheri,et al.  RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem , 2009, BMC Genomics.

[22]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[23]  Jian-Jun Shu,et al.  An Improved Scoring Matrix for Multiple Sequence Alignment , 2012, 1402.5327.

[24]  Xiaojun Wu,et al.  Multiple sequence alignment using the Hidden Markov Model trained by an improved quantum-behaved particle swarm optimization , 2012, Inf. Sci..

[25]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[26]  Xuyu Xiang,et al.  Multiple sequence alignment algorithm based on a dispersion graph and ant colony algorithm , 2009, J. Comput. Chem..

[27]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[28]  Dennis R. Livesay,et al.  Probalign: multiple sequence alignment using partition function posterior probabilities , 2006, Bioinform..

[29]  Lior Pachter,et al.  Fast Statistical Alignment , 2009, PLoS Comput. Biol..

[30]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[31]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[32]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[33]  Ari Löytynoja,et al.  webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser , 2010, BMC Bioinformatics.

[34]  Vincenzo Cutello,et al.  Protein multiple sequence alignment by hybrid bio-inspired algorithms , 2011, Nucleic acids research.

[35]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[36]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[37]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[39]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[40]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[41]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.