A Multi-Objective Evolutionary Algorithm for Improving Multiple Sequence Alignments

Multiple Sequence Alignments are essential tools for many tasks performed in molecular biology. This paper proposes an efficient, scalable and effective multi-objective evolutionary algorithm to optimize pre-aligned sequences. This algorithm benefits from the great diversity of state-of-the-art algorithms and produces alignments that do not depend on specific sequence features. The proposed method is validated with a database of refined multiple sequence alignments and uses four standard metrics to compare the quality of the results.

[1]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[2]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[3]  D. Higgins,et al.  Multiple sequence alignments. , 2005, Current opinion in structural biology.

[4]  M. Blanchette,et al.  Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment , 2013, Genome Biology.

[5]  Kalyanmoy Deb,et al.  Finding Knees in Multi-objective Optimization , 2004, PPSN.

[6]  Simon Whelan,et al.  Measuring the distance between multiple sequence alignments , 2012, Bioinform..

[7]  Elisabeth R. M. Tillier,et al.  The accuracy of several multiple sequence alignment programs for proteins , 2006, BMC Bioinformatics.

[8]  Héctor Pomares,et al.  Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns , 2013, Bioinform..

[9]  Tandy J. Warnow,et al.  The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Héctor Pomares,et al.  Optimization of multiple sequence alignment methodologies using a multiobjective evolutionary algorithm based on NSGA-II , 2012, 2012 IEEE Congress on Evolutionary Computation.

[11]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[12]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[13]  Ignacio Rojas,et al.  Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques , 2012, Nucleic acids research.

[14]  Lothar Thiele,et al.  The Hypervolume Indicator Revisited: On the Design of Pareto-compliant Indicators Via Weighted Integration , 2007, EMO.

[15]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[16]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[17]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[18]  Mathieu Blanchette,et al.  Computation and analysis of genomic multi-sequence alignments. , 2007, Annual review of genomics and human genetics.

[19]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[20]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[21]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[22]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[23]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.