Evaluating the Accuracy of Splice Site Prediction based on Integrating Jensen-Shannon Divergence and a Polynomial Equation of Order 2

Advances in DNA sequencing technology have caused generation of the vast amount of new sequence data. It is essential to understand the functions, features, and structures of every newly sequenced data. Analyzing sequence data by different methods could provide important information about the sequence data. One of the essential tasks for genome annotation is gene prediction that can help to understand the features and determine functions of the genes. One of the key steps towards correct gene structure prediction is accurate splice site detection. There are vast numbers of splice site prediction methods, however, a few of them can be incorporated in gene prediction modules because of their complexity. In this paper, a novel model is presented to recognize unknown splice sites in a new genome without using any prior knowledge. Our model is defined based on integrating Jensen-Shannon divergence and a polynomial equation of order 2. Finally, the proposed model is evaluated on Yeast’s genome to predict splice sites. The experimental results suggest that the proposed method is an effective approach for splice site prediction. General Terms Bioinformatics, Gene structure

[1]  J. Oliver,et al.  Sequence Compositional Complexity of DNA through an Entropic Segmentation Method , 1998 .

[2]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[3]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.

[4]  Saman K. Halgamuge,et al.  Fast splice site detection using information content and feature reduction , 2008, BMC Bioinformatics.

[5]  Neelam Goel,et al.  Splice Site Detection in DNA Sequences using Probabilistic Neural Network , 2013 .

[6]  Byeong-Soo Jeong,et al.  Effective DNA Encoding for Splice Site Prediction Using SVM , 2014 .

[7]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[8]  Carlos R. Erig Lima,et al.  Evaluation of Weight Matrix Models in the splice junction recognition problem , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[9]  Neelam Goel,et al.  An Improved Method for Splice Site Prediction in DNA Sequences Using Support Vector Machines , 2015 .

[10]  J. L. Li,et al.  High-accuracy splice site prediction based on sequence component and position features. , 2012, Genetics and molecular research : GMR.

[11]  Prabina Kumar Meher,et al.  A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data , 2014, BMC Bioinformatics.

[12]  Qingshan Jiang,et al.  A New Classification Method for Human Gene Splice Site Prediction , 2012, HIS.

[13]  Qingshan Jiang,et al.  A novel splice site prediction method using support vector machine , 2013 .

[14]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[15]  Kay C. Wiese,et al.  Improving splice-junctions classification employing a novel encoding schema and decision-tree , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[16]  Michael Q. Zhang,et al.  A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[17]  J. Huang,et al.  An approach of encoding for prediction of splice sites using SVM. , 2006, Biochimie.

[18]  H E Stanley,et al.  Finding borders between coding and noncoding DNA regions by an entropic segmentation method. , 2000, Physical review letters.