Multiple sequence alignment with affine gap by using multi-objective genetic algorithm

Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate and statistically significant multiple alignments is still a challenge. In this paper, we propose an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments with affine gap in multiple sequence data. The main advantage of our approach is that a large number of tradeoff (i.e., non-dominated) alignments can be obtained by a single run with respect to conflicting objectives: affine gap penalty minimization and similarity and support maximization. To the best of our knowledge, this is the first effort with three objectives in this direction. The proposed method can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding alignments. By analyzing the obtained optimal alignments, the decision maker can understand the tradeoff between the objectives. We compared our method with the three well-known multiple sequence alignment methods, MUSCLE, SAGA and MSA-GA. As the first of them is a progressive method, and the other two are based on evolutionary algorithms. Experiments on the BAliBASE 2.0 database were conducted and the results confirm that MSAGMOGA obtains the results with better accuracy statistical significance compared with the three well-known methods in aligning multiple sequence alignment with affine gap. The proposed method also finds solutions faster than the other evolutionary approaches mentioned above.

[1]  Mehmet Kaya Motif Discovery Using Multi-Objective Genetic Algorithm in Biosequences , 2007, IDA.

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[4]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[5]  Javid Taheri,et al.  RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem , 2009, BMC Genomics.

[6]  Akito Taneda Multi-objective pairwise RNA sequence alignment , 2010, Bioinform..

[7]  Hongwei Huo,et al.  A simulated annealing algorithm for multiple sequence alignment with guaranteed accuracy , 2007, Third International Conference on Natural Computation (ICNC 2007).

[8]  Paola Bonizzoni,et al.  The complexity of multiple sequence alignment with SP-score that is a metric , 2001, Theor. Comput. Sci..

[9]  Miguel A. Vega-Rodríguez,et al.  An evolutionary approach for performing multiple sequence alignment , 2010, IEEE Congress on Evolutionary Computation.

[10]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[11]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[13]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[14]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[15]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[16]  C. Gondro,et al.  A simple genetic algorithm for multiple sequence alignment. , 2007, Genetics and molecular research : GMR.

[17]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[18]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[19]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[20]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[21]  David W Mount,et al.  Using hidden Markov models to align multiple sequences. , 2009, Cold Spring Harbor protocols.

[22]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[23]  Juan Antonio Gómez Pulido,et al.  Parallel Niche Pareto AlineaGA – an Evolutionary Multiobjective approach on Multiple Sequence Alignment , 2011 .

[24]  Mehmet Kaya,et al.  A Novel Approach to Extract Structured Motifs by Multi-Objective Genetic Algorithm , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[25]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[26]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[27]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[28]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[29]  Prabhas Chongstitvatana,et al.  A multiple objective evolutionary algorithm for multiple sequence alignment , 2005, GECCO '05.

[30]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[31]  Ruhul A. Sarker,et al.  Progressive Alignment Method Using Genetic Algorithm for Multiple Sequence Alignment , 2012, IEEE Transactions on Evolutionary Computation.

[32]  Héctor Pomares,et al.  Optimization of multiple sequence alignment methodologies using a multiobjective evolutionary algorithm based on NSGA-II , 2012, 2012 IEEE Congress on Evolutionary Computation.

[33]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[34]  George Karypis,et al.  Pareto Optimal Pairwise Sequence Alignment , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .