Accurate Prediction of Translation Initiation Sites by Universum SVM

In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions that start code for proteins. These points are called translation initiation sites (TIS). The task of recognizing TIS can be modeled as a classification problem. In this paper, we use a new pattern classification algorithm which has recently been proposed by Vapnik to deal with this problem. Numerical experiments proved the considerable improvement of this method compared with the leading existing approaches.

[1]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[2]  Limsoon Wong,et al.  Using feature generation and feature selection for accurate prediction of translation initiation sites. , 2002, Genome informatics. International Conference on Genome Informatics.

[3]  Jason Weston,et al.  Inference with the Universum , 2006, ICML.

[4]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[5]  Jinyan Li,et al.  Bioinformatics Adventures in Database Research , 2003, ICDT.

[6]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[7]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[8]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[9]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[10]  Ioannis P. Vlahavas,et al.  Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences , 2005, Panhellenic Conference on Informatics.

[11]  Bernhard Schölkopf,et al.  An Analysis of Inference with the Universum , 2007, NIPS.

[12]  M. Kozak An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. , 1987, Nucleic acids research.