论文信息 - PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Phosphorylation is one of the most essential post-translational modifications in eukaryotes. Studies on kinases and their substrates are important for understanding cellular signaling networks. Because of the cost in time and labor associated with large-scale wet-bench experiments, computational prediction of phosphorylation sites becomes important and many computational tools have been developed in the recent decades. The prediction tools can be grouped into two categories: kinase-specific and non-kinase-specific tools. With more kinases being discovered by the new sequencing technologies, accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wider variety of species. In this manuscript, a support vector machine is used to combine eight different sequence level scoring functions to predict phosphorylation sites. The attributes used by this work, including Shannon entropy, relative entropy, predicted protein secondary structure, predicted protein disorder, solvent accessible area, overlapping properties, averaged cumulative hydrophobicity, and k-nearest neighbor, were able to obtain better results than the previously used attributes by other similar methods. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test dataset were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, which compared favorably with those of several existing methods. A web server based on our method was constructed for public use. The server, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

Bo Yao | Yongchao Dou | Chi Zhang

[1] L. Iakoucheva,et al. The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[2] Joachim Selbig,et al. PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor , 2007, Nucleic Acids Res..

[3] Dong Xu,et al. Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[4] Bermseok Oh,et al. Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[5] E. DeLong,et al. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[6] N. Blom,et al. Statistical analysis of protein kinase specificity determinants , 1998, FEBS letters.

[7] Yu Xue,et al. PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory , 2006, BMC Bioinformatics.

[8] Douglas L. Brutlag,et al. Identification of Protein Motifs Using Conserved Amino Acid Properties and Partitioning Techniques , 1995, ISMB.

[9] Yu Shyr,et al. Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[10] D. Eisenberg,et al. Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. , 1983, Journal of molecular biology.

[11] D. Hardie,et al. Evidence for a protein kinase cascade in higher plants. 3-Hydroxy-3-methylglutaryl-CoA reductase kinase. , 1992, European journal of biochemistry.