Higher accuracy protein multiple sequence alignments by genetic algorithm

Abstract A Multiple sequence alignment (MSA) gives insight into the evolutionary, structural and functional relationships among the protein sequences. Here, the initial MSAs are chosen as the output of the two important protein sequence alignment programs: ProbCons and MCoffee. We have used the evolutionary operators of a genetic algorithm to find the optimized protein alignment after several iterations of the algorithm. Thus, we have developed a new MSA computational tool called as the Protein Alignment by Stochastic Algorithm (PASA). The efficiency of protein alignments is evaluated in terms of Total Column (TC) score. The TC score is basically the number of correctly aligned columns between the test alignments and the reference alignments divided by the total number of columns. The PASA is found to be statistically more accurate protein alignment method in our analysis in comparison to other popular bioinformatics tools.

[1]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[2]  J. D. Thompson,et al.  Towards a reliable objective function for multiple sequence alignments. , 2001, Journal of molecular biology.

[3]  Narayan Behera,et al.  Phenotypic plasticity can potentiate rapid evolutionary change. , 2004, Journal of theoretical biology.

[4]  J. Pei,et al.  Multiple protein sequence alignment. , 2008, Current opinion in structural biology.

[5]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[6]  L. A. Anbarasu,et al.  Multiple molecular sequence alignment by island parallel genetic algorithm , 2000 .

[7]  Vincenzo Cutello,et al.  Protein multiple sequence alignment by hybrid bio-inspired algorithms , 2011, Nucleic acids research.

[8]  Peter W. Collingridge,et al.  MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments , 2012, BMC Bioinformatics.

[9]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[10]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Quinn Snell,et al.  Accelerated large-scale multiple sequence alignment , 2011, BMC Bioinformatics.

[13]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[14]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[15]  Liming Cai,et al.  Evolutionary computation techniques for multiple sequence alignment , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[16]  Olivier Poch,et al.  A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives , 2011, PloS one.

[17]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[18]  David Haussler,et al.  Cactus: Algorithms for genome multiple sequence alignment. , 2011, Genome research.

[19]  V. Nanjundiah,et al.  Trans gene regulation in adaptive evolution: a genetic algorithm model. , 1997, Journal of theoretical biology.

[20]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[21]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[22]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.