SpliceCombo: A Hybrid Technique Efficiently Use for Principal Component Analysis of Splice Site Prediction

The primary step in search of the gene prediction is an identification of the coding region from genomic DNA sequence. Gene structure in the case of a eukaryotic organism is composed of promoter, intron, start codon, exons, stop codon, etc. Splice site prediction, which separates the junction between exon and intron, though the sequence beside. The splice sites have huge preservation, however, the precision of the tool exhibits less than 90%. The main objective of this work to exhibits a hybrid technique that efficiently improves the existing gene recognition technique. Therefore to enhance the identification of splice sites, the respective algorithm needs to be improved. Over the last decade, the researcher paid more attention to improve the accuracy of a predicted model in this domain. Our proposed method, SpliceCombo involves three stages. At initial stage, which considers the principal Component Analysis, based on the feature extracted. In the intermediate stage, i.e.,, the second stage Case- Based Reasoning is done, i.e., feature selection. The third stage uses support vector machine based along with polynomial kernel function for final classification. In comparison with other methods, the proposed SpliceCombo model outperforms other prediction models with respect to prediction accuracies. Particularly for donor splice site the methodology exhibits sensitivity is 97.25% accurate and specificity is 97.46% accurate. For acceptor Splice Site the sensitivity is 96.51% and Specificity is 94.48% correct.

[1]  Yvan Saeys,et al.  SpliceMachine: predicting splice sites from high-dimensional local context representations , 2005, Bioinform..

[2]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[3]  Karin Noy,et al.  Improved model-based, platform-independent feature extraction for mass spectrometry , 2007, Bioinform..

[4]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Peter G. Korning,et al.  Splice Site Prediction in Arabidopsis Thaliana Pre-mRNA by Combining Local and Global Sequence Information , 1996 .

[7]  Tao Li,et al.  LIBGS: A MATLAB software package for gene selection , 2010, Int. J. Data Min. Bioinform..

[8]  Miao Zhang,et al.  Improved spliced alignment from an information theoretic approach , 2006, Bioinform..

[9]  Kai Li,et al.  Visualization methods for statistical analysis of microarray clusters , 2005, BMC Bioinformatics.

[10]  Deepak Garg,et al.  Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification , 2014 .

[11]  Srabanti Maji and Deepak Garg Progress in Gene Prediction: Principles and Challenges , 2013 .

[12]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[13]  Srabanti Maji and Deepak Garg Hidden Markov Model for Splicing Junction Sites Identification in DNA Sequences , 2013 .

[14]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.

[15]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[16]  P. Sharp,et al.  Splicing of precursors to mRNAs by the spliceosomes , 1993 .

[17]  David Haussler,et al.  Improved splice site detection in Genie , 1997, RECOMB '97.

[18]  K. Heller,et al.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. , 2003, Genome research.

[19]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[20]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[21]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[22]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[23]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[24]  S. Salzberg,et al.  GeneSplicer: a new computational method for splice site prediction. , 2001, Nucleic acids research.

[25]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[26]  Kiyoshi Asai,et al.  Modeling splicing sites with pairwise correlations , 2002, ECCB.

[27]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[28]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[29]  Yan-Da Li,et al.  Identifying splicing sites in eukaryotic RNA: support vector machine approach , 2003, Comput. Biol. Medicine.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Jagath C. Rajapakse,et al.  Markov encoding for detecting signals in genomic sequences , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Anne-Laure Boulesteix,et al.  Cross-study validation for the assessment of prediction algorithms , 2014, Bioinform..

[33]  Chung-Chin Lu,et al.  Prediction of splice sites with dependency graphs and their expanded bayesian networks , 2005, Bioinform..

[34]  D. Haussler,et al.  Genie--gene finding in Drosophila melanogaster. , 2000, Genome research.

[35]  Deepak Garg,et al.  Gene Finding Using Hidden Markov Model , 2012 .

[36]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[37]  Steven Salzberg,et al.  A method for identifying splice sites and translational start sites in eukaryotic mRNA , 1997, Comput. Appl. Biosci..