Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs

Due to the explosive and growing size of the genome database, the discovery of gene has become one of the most computationally intensive tasks in bioinformatics. Many such systems have been developed to find genes; however, there is still some room to improve the prediction accuracy. This paper proposes a reinforcement learning model for a combination of gene predictions from existing gene-finding programs. The model learns the optimal policy for accepting the best predictions. The fitness of a policy is reinforced if the selected prediction at a nucleotide site correctly corresponds to the true annotation. The model searches for the optimal policy which maximizes the expected prediction accuracy over all nucleotide sites in the sequences. The experimental results demonstrate that the proposed model yields higher prediction accuracy than that obtained by the single best program.

[1]  K. Murakami,et al.  Gene recognition by combination of several gene-finding programs , 1998, Bioinform..

[2]  Ying Xu,et al.  Reference-based gene model prediction on DNA contigs (extended abstract) , 1997, RECOMB '97.

[3]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[4]  D Haussler,et al.  Integrating database homology in a probabilistic gene structure model. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Peng-Yeng Yin,et al.  Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization , 2002, Signal Process..

[7]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[8]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[9]  Trees-Juen Chuang,et al.  A complexity reduction algorithm for analysis and annotation of large genomic sequences. , 2003, Genome research.

[10]  Peng-Yeng Yin Modeling, Analysis, and Applications in Metaheuristic Computing: Advancements and Trends , 2012 .

[11]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[12]  Michael Ruogu Zhang,et al.  Identification of protein coding regions in the human genome by quadratic discriminant analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[13]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[14]  Matthew Richardson,et al.  Learning with Knowledge from Multiple Experts , 2003, ICML.

[15]  Vladimir Pavlovic,et al.  A Bayesian framework for combining gene predictions , 2002, Bioinform..

[16]  Bir Bhanu,et al.  Integrating relevance feedback techniques for image retrieval using reinforcement learning , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Mohamed Haddar,et al.  A Hybrid Genetic Algorithm for Optimization of Two-dimensional Cutting-Stock Problem , 2010, Int. J. Appl. Metaheuristic Comput..

[18]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[19]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[20]  M. Adams,et al.  A tool for analyzing and annotating genomic sequences. , 1997, Genomics.

[21]  Edward C. Uberbacher,et al.  Automated Gene Identification in Large-Scale Genomic Sequences , 1997, J. Comput. Biol..

[22]  Habiba Drias,et al.  Evolutionary Approaches for the Extraction of Classification Rules , 2014, Int. J. Appl. Metaheuristic Comput..

[23]  Alan K. Mackworth,et al.  Improving gene recognition accuracy by combining predictions from two gene-finding programs , 2002, Bioinform..

[24]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[25]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[26]  Luciano Milanesi,et al.  GeneBuilder: interactive in silico prediction of gene structure , 1999, Bioinform..

[27]  Bir Bhanu,et al.  Closed-Loop Object Recognition Using Reinforcement Learning , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  N. Harris,et al.  Genotator: a workbench for sequence annotation. , 1997, Genome research.