A novel method for splice sites prediction using sequence component and hidden Markov model

With increasing growth of DNA sequence data, it has become an urgent demand to develop new methods to accurately predict the genes. The performance of gene detection methods mainly depend on the efficiency of splice site prediction methods. In this paper, a novel method for detecting splice sites is proposed by using a new effective DNA encoding method and AdaBoost.M1 classifier. Our proposed DNA encoding method is based on multi-scale component (MSC) and first order Markov model (MM1). It has been applied to the HS3D dataset with repeated 10 fold cross validation. The experimental results indicate that the new method has increased the classification accuracy and outperformed some current methods such as MM1-SVM, Reduced MM1-SVM, SVM-B, LVMM, DM-SVM, DM2-AdaBoost and MS C+Pos(+APR)-SVM.

[1]  Neelam Goel,et al.  Splice Site Detection in DNA Sequences using Probabilistic Neural Network , 2013 .

[2]  张慧玲,et al.  A Novel Splice Site Prediction Method using Support Vector Machine , 2013 .

[3]  Jason Tsong-Li Wang,et al.  Effective hidden Markov models for detecting splicing junction sites in DNA sequences , 2001, Inf. Sci..

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Wei Bin and Zhao Jing A Novel Artificial Neural Network and an Improved Particle Swarm Optimization used in Splice Site Prediction , 2014 .

[6]  Jing Li,et al.  Splice sites prediction of Human genome using length-variable Markov model and feature selection , 2010, Expert Syst. Appl..

[7]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[8]  Saman K. Halgamuge,et al.  Splice site identification using probabilistic parameters and SVM classification , 2006 .

[9]  Yixin Chen,et al.  Splice site prediction using support vector machines with a Bayes kernel , 2006, Expert Syst. Appl..

[10]  Nizamettin Aydin,et al.  Splice sites prediction of human genome using AdaBoost , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[11]  Kay C. Wiese,et al.  Improving splice-junctions classification employing a novel encoding schema and decision-tree , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[12]  Gunnar Rätsch,et al.  Accurate splice site prediction using support vector machines , 2007, BMC Bioinformatics.

[13]  Heitor Silvério Lopes,et al.  A Configware Approach for High-Speed Parallel Analysis of genomic Data , 2007, J. Circuits Syst. Comput..

[14]  Chung-Ming Chen,et al.  Genomic splice site prediction algorithm based on nucleotide sequence pattern for RNA viruses , 2009, Comput. Biol. Chem..

[15]  Jagath C. Rajapakse,et al.  Markov encoding for detecting signals in genomic sequences , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  J. L. Li,et al.  High-accuracy splice site prediction based on sequence component and position features. , 2012, Genetics and molecular research : GMR.

[17]  Chung-Chin Lu,et al.  Prediction of splice sites with dependency graphs and their expanded bayesian networks , 2005, Bioinform..

[18]  Simon Kasif,et al.  Modeling splice sites with Bayes networks , 2000, Bioinform..

[19]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[20]  Saman K. Halgamuge,et al.  Fast splice site detection using information content and feature reduction , 2008, BMC Bioinformatics.

[21]  Changiz Eslahchi,et al.  Importance of RNA secondary structure information for yeast donor and acceptor splice site predictions by neural networks , 2006, Comput. Biol. Chem..