An Optimization Approach for Multiple Sequence Alignment using Divide-Conquer and Genetic Algorithm

Multiple Sequence Alignment (MSA) is a very effective tool in bioinformatics. It is used for the prediction of the structure and function of the protein, locating DNA regulatory elements like binding sites, and evolutionary analysis. This research paper proposed an optimization method for the improvement of multiple sequence alignment generated through progressive alignment. This optimization method consists of a fusion of two problem-solving techniques, divide-conquer and genetic algorithms in which the initial population of MSAs was generated through progressive alignment. Each multiple alignment was then divided vertically into four parts, three genetic operators were applied on each part of the MSA, recombination was done to reconstruct the full MSA. To estimate the performance of the method the results generated through the method are compared with well-known existing MSA methods named Clustal Ω, MUSCLE, PRANK, and Clustal W. Experimental results showed an 11-26% increase in sum_of_pair score (SP score) in the proposed method in comparison to the above-mentioned methods. SP score is the cumulative score of all possible pairs of alignment within the MSA. Keywords—Multiple sequence alignment; divide; and conquer; genetic algorithm; optimization method

[1]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[2]  Ramchalam Kinattinkara Ramakrishnan,et al.  RLALIGN: A Reinforcement Learning Approach for Multiple Sequence Alignment , 2018, 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE).

[3]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[4]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[5]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[6]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[7]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[8]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[9]  Yi Pan,et al.  Partitioned optimization algorithms for multiple sequence alignment , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  Ruhul A. Sarker,et al.  Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment , 2011, BMC Bioinformatics.

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  Thomas Kiel Rasmussen,et al.  Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid. , 2003, Bio Systems.

[16]  C. Gondro,et al.  A simple genetic algorithm for multiple sequence alignment. , 2007, Genetics and molecular research : GMR.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[19]  Reza Jafari,et al.  Using deep reinforcement learning approach for solving the multiple sequence alignment problem , 2019, SN Applied Sciences.

[20]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[21]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[22]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[23]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[24]  René Staritzbichler,et al.  AlignMe—a membrane protein sequence alignment web server , 2014, Nucleic Acids Res..

[25]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.