Prediction of phosphorylation sites using SVMs

MOTIVATION Phosphorylation is involved in diverse signal transduction pathways. By predicting phosphorylation sites and their kinases from primary protein sequences, we can obtain much valuable information that can form the basis for further research. Using support vector machines, we attempted to predict phosphorylation sites and the type of kinase that acts at each site. RESULTS Our prediction system was limited to phosphorylation sites catalyzed by four protein kinase families and four protein kinase groups. The accuracy of the predictions ranged from 83 to 95% at the kinase family level, and 76-91% at the kinase group level. The prediction system used-PredPhospho-can be applied to the functional study of proteins, and can help predict the changes in phosphorylation sites caused by amino acid variations at intra- and interspecies levels.

[1]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[2]  M. Yaffe,et al.  A motif-based profile scanning approach for genome-wide prediction of signaling pathways , 2001, Nature Biotechnology.

[3]  Nikolaj Blom,et al.  PhosphoBase, a database of phosphorylation sites: release 2.0 , 1999, Nucleic Acids Res..

[4]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[5]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[6]  L. Pinna,et al.  How do protein kinases recognize their substrates? , 1996, Biochimica et biophysica acta.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[10]  J. H. Wang,et al.  Substrate specificity characterization of a cdc2-like protein kinase purified from bovine brain. , 1993, The Journal of biological chemistry.

[11]  Mark A. Best,et al.  Bioinformatics: the Machine Learning Approach, 2nd edn , 2004 .

[12]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[13]  T. Soderling,et al.  A structural basis for substrate specificities of protein Ser/Thr kinases: primary sequence preference of casein kinases I and II, NIMA, phosphorylase kinase, calmodulin-dependent kinase II, CDK5, and Erk1 , 1996, Molecular and cellular biology.

[14]  Zhou Songyang,et al.  Use of an oriented peptide library to determine the optimal substrates of protein kinases , 1994, Current Biology.

[15]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..