A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment

The alignment among three or more nucleotides/amino acids sequences at the same time is known as multiple sequence alignment (MSA), a nondeterministic polynomial time (NP)-hard optimization problem. The time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In this work, we deal with a multiobjective version of the MSA problem wherein the goal is to simultaneously optimize the accuracy and conservation of the alignment. A parallel version of the hybrid multiobjective memetic metaheuristics for MSA is proposed. To evaluate the parallel performance of our proposal, we have selected a pull of data sets with different number of sequences (up to 1000 sequences) and study its parallel performance against other well-known parallel metaheuristics published in the literature, such as MSAProbs, tree-based consistency objective function for alignment evaluation (T-Coffee), Clustal [Formula: see text], and multiple alignment using fast Fourier transform (MAFFT). The comparative study reveals that our parallel aligner obtains better results than MSAProbs, T-Coffee, Clustal [Formula: see text], and MAFFT. In addition, the parallel version is around 25 times faster than the sequential version with 32 cores, obtaining an efficiency around 80%.

[1]  Yongchao Liu,et al.  MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities , 2010, Bioinform..

[2]  D. Bacon,et al.  Multiple Sequence Alignment , 1986, Journal of molecular biology.

[3]  Miguel A. Vega-Rodríguez,et al.  A Parallel Multiobjective Artificial Bee Colony Algorithm for Dealing with the Traffic Grooming Problem , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[4]  Miguel A. Vega-Rodríguez,et al.  Finding Patterns in Protein Sequences by Using a Hybrid Multiobjective Teaching Learning Based Optimization Algorithm , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Hasan H Otu,et al.  Objective functions. , 2014, Methods in molecular biology.

[6]  Miguel A. Vega-Rodríguez,et al.  A Parallel Two-Level Multiobjective Artificial Bee Colony Approach for Traffic Grooming , 2013, EUROCAST.

[7]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[8]  Miguel A. Vega-Rodríguez,et al.  A multiobjective approach based on artificial bee colony for the static routing and wavelength assignment problem , 2013, Soft Comput..

[9]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[10]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[11]  Aravind Seshadri,et al.  A FAST ELITIST MULTIOBJECTIVE GENETIC ALGORITHM: NSGA-II , 2000 .

[12]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[13]  Miguel A. Vega-Rodríguez,et al.  A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences , 2017, The Journal of Supercomputing.

[14]  Miguel A. Vega-Rodríguez,et al.  Applying MOEAs to solve the static Routing and Wavelength Assignment problem in optical WDM networks , 2013, Eng. Appl. Artif. Intell..

[15]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[16]  E. Sonnhammer,et al.  Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features , 2008, Nucleic acids research.

[17]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[18]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Muzaffar Eusuff,et al.  Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization , 2006 .

[20]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[21]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[22]  Miguel A. Vega-Rodríguez,et al.  A Hybrid Multiobjective Memetic Metaheuristic for Multiple Sequence Alignment , 2016, IEEE Transactions on Evolutionary Computation.

[23]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[24]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[25]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[26]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.