Predicting the protein SUMO modification sites based on Properties Sequential Forward Selection (PSFS).

Protein SUMO modification is an important post-translational modification and the optimization of prediction methods remains a challenge. Here, by using Support Vector Machines algorithm (SVM), a novel computational method was developed for SUMO modification site prediction based on Sequential Forward Selection (SFS) of hundreds of amino acid properties, which are collected by Amino Acid Index database (http://www.genome.jp/aaindex). Our method also compares with the 0/1 system, in which the 20 amino acids are represented by 20-dimensional vectors (A = 00000000000000000001, C = 00000000000000000010 and so on). The overall accuracy of leave-one-out cross-validation for our method reaches 89.18%, which is higher than 0/1 system. It indicated that the SUMO modification prediction process is highly related to the amino acid property and this approach here provide a helpful tool for further investigation of the SUMO modification and identification of sumoylation sites in proteins. The software is available at http://www.biosino.org/sumo.

[1]  Chih-Jen Lin,et al.  Radius Margin Bounds for Support Vector Machines with the RBF Kernel , 2002, Neural Computation.

[2]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[3]  Yixue Li,et al.  Operon prediction based on SVM , 2006, Comput. Biol. Chem..

[4]  H. Bull,et al.  Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. , 1974, Archives of biochemistry and biophysics.

[5]  Xia Lin,et al.  Regulation of Smad4 Sumoylation and Transforming Growth Factor-β Signaling by Protein Inhibitor of Activated STAT1* , 2004, Journal of Biological Chemistry.

[6]  M. Tatham,et al.  SUMO and transcriptional regulation. , 2004, Seminars in cell & developmental biology.

[7]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[8]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[9]  R. Hay,et al.  Protein modification by SUMO. , 2001, Trends in biochemical sciences.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Rong Zeng,et al.  Predicting O-glycosylation sites in mammalian proteins by using SVMs , 2006, Comput. Biol. Chem..

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  F. Melchior,et al.  SUMO: ligases, isopeptidases and nuclear pores. , 2003, Trends in biochemical sciences.

[15]  Yu Xue,et al.  SUMOsp: a web server for sumoylation site prediction , 2006, Nucleic Acids Res..

[16]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[17]  Peilin Jia,et al.  Demonstration of two novel methods for predicting functional siRNA efficiency , 2006, BMC Bioinformatics.

[18]  P. Freemont,et al.  SUMO , 2003, Current Biology.

[19]  S. Jentsch,et al.  Ubiquitin and proteasomes: Sumo, ubiquitin's mysterious cousin , 2001, Nature Reviews Molecular Cell Biology.

[20]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[21]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[22]  G. Gill,et al.  Post-translational modification by the small ubiquitin-related modifier SUMO has big effects on transcription factor activity. , 2003, Current opinion in genetics & development.

[23]  R. Hay,et al.  SUMO: a history of modification. , 2005, Molecular cell.