A Reinforcement Learning Based Approach to Multiple Sequence Alignment

Multiple sequence alignment plays an important role in comparative genomic sequence analysis, being one of the most challenging problems in bioinformatics. This problem refers to the process of arranging the primary sequences of DNA, RNA or protein to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships between the sequences. In this paper we tackle multiple sequence alignment from a computational perspective and we introduce a novel approach, based on reinforcement learning, for addressing it. The experimental evaluation is performed on several DNA data sets, two of which contain human DNA sequences. The efficiency of our algorithm is shown by the obtained results, which prove that our technique outperforms other methods existing in the literature and which also indicate the potential of our proposal.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Tshilidzi Marwala,et al.  A dynamic programming approach to missing data estimation using neural networks , 2013, Inf. Sci..

[3]  Jiaohua Qin,et al.  Ant Colony with Genetic Algorithm Based on Planar Graph for Multiple Sequence Alignment , 2010 .

[4]  Thomas Kiel Rasmussen,et al.  Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid. , 2003, Bio Systems.

[5]  Shuai Liu,et al.  The research on DNA multiple sequence alignment based on adaptive immune genetic algorithm , 2011, Proceedings of 2011 International Conference on Electronics and Optoelectronics.

[6]  Shyi-Ming Chen,et al.  Multiple DNA sequence alignment based on genetic simulated annealing techniques , 2007 .

[7]  Steffen Eger Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics , 2013, Inf. Sci..

[8]  Evan W. Steeg,et al.  Neural networks, adaptive optimization, and RNA secondary structure prediction , 1993 .

[9]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[10]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[11]  A Amouda Self organizing genetic algorithm for multiple sequence alignment , 2011 .

[12]  Istvan Gergely Czibula,et al.  A SOFTWARE FRAMEWORK FOR SOLVING COMBINATORIAL OPTIMIZATION TASKS , 2011 .

[13]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[14]  Sara Nasser,et al.  Multiple Sequence Alignment using Fuzzy Logic , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[15]  Olivier Poch,et al.  A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives , 2011, PloS one.

[16]  Shi-Jay Chen,et al.  Multiple DNA Sequence Alignment Based on Genetic Algorithms and Divide-and-Conquer Techniques , 2005 .

[17]  Yi Pan,et al.  Partitioned optimization algorithms for multiple sequence alignment , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[18]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[19]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[20]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[21]  Rahul Chauhan,et al.  Alignment of Multiple Sequences using GA method , 2013 .

[22]  Hyrum Carroll,et al.  DNA reference alignment benchmarks based on tertiary structure of encoded proteins , 2007, Bioinform..

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.