Splice site detection with a higher-order markov model implemented on a neural network.

The performance of the ab inito gene prediction approaches mostly depends on the effectiveness of detecting the splice sites. This paper addresses the problem of splice site detection using higher-order Markov models. The tenet of our approach is to brace the higher-order dependencies of a Markov model by a neural network that receives the inputs from low-order Markov chains. The method is able not only to capture the higher-order dependencies in the bases of the consensus sequence immediately surrounding the splice site but also to distinguish the characteristics of the coding and non-coding regions on both sides of the splice site. Our experiments indicate that the present method achieves better accuracies over the techniques employing low-order Markov chains and other earlier approaches.

[1]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[2]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Elmar Nöth,et al.  Interpolated markov chains for eukaryotic promoter recognition , 1999, Bioinform..

[4]  George D. Magoulas,et al.  Learning Rate Adaptation in Stochastic Gradient Descent , 2001 .

[5]  S. Salzberg,et al.  GeneSplicer: a new computational method for splice site prediction. , 2001, Nucleic acids research.

[6]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7]  T A Thanaraj,et al.  Positional characterisation of false positives from computational prediction of human splice sites. , 2000, Nucleic acids research.

[8]  Jason Tsong-Li Wang,et al.  Effective hidden Markov models for detecting splicing junction sites in DNA sequences , 2001, Inf. Sci..

[9]  V. Brendel,et al.  Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. , 1998, Nucleic acids research.

[10]  Jean-Pierre Martens,et al.  An equalized error backpropagation algorithm for the on-line training of multilayer perceptrons , 2002, IEEE Trans. Neural Networks.

[11]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[12]  V. Brendel,et al.  Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences. , 1996, Nucleic acids research.

[13]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[14]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[15]  Michael Ruogu Zhang,et al.  Identification of protein coding regions in the human genome by quadratic discriminant analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[17]  Martin Reczko,et al.  Functional site prediction on the DNA sequence by artificial neural networks , 1996, Proceedings IEEE International Joint Symposia on Intelligence and Systems.

[18]  Gunnar Rätsch,et al.  New Methods for Splice Site Recognition , 2002, ICANN.

[19]  Donald J. Patterson,et al.  Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction , 2001, Pacific Symposium on Biocomputing.