A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants

Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylation site data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of k-nearest neighbor and support vector machine for predicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.

[1]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2008, Nucleic Acids Res..

[2]  Michael Gribskov,et al.  The PlantsP and PlantsT Functional Genomics Databases , 2003, Nucleic Acids Res..

[3]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[4]  Bostjan Kobe,et al.  The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information , 2008, Nucleic Acids Res..

[5]  F. Eisenhaber,et al.  pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model , 2007, Biology Direct.

[6]  Lewis Y. Geer,et al.  Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry , 2007, Proceedings of the National Academy of Sciences.

[7]  Dong Xu,et al.  P3DB: a plant protein phosphorylation database , 2008, Nucleic Acids Res..

[8]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[9]  Christopher J. Oldfield,et al.  Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners , 2008, BMC Genomics.

[10]  Albert J R Heck,et al.  Quantitative Phosphoproteomics of Early Elicitor Signaling in Arabidopsis*S , 2007, Molecular & Cellular Proteomics.

[11]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[12]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[13]  B. Garcia,et al.  Proteomics , 2011, Journal of biomedicine & biotechnology.

[14]  Koenraad Van Leemput,et al.  Prediction of kinase-specific phosphorylation sites using conditional random fields , 2008, Bioinform..

[15]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[16]  M. Tomita,et al.  Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis , 2008, Molecular systems biology.

[17]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[18]  Hanno Steen,et al.  Phosphorylation Analysis by Mass Spectrometry , 2006, Molecular & Cellular Proteomics.

[19]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[20]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[21]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[22]  E. Krebs,et al.  Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. , 1991, The Journal of biological chemistry.

[23]  Steven P. Gygi,et al.  Large-scale phosphorylation analysis of mouse liver , 2007, Proceedings of the National Academy of Sciences.

[24]  Joachim Selbig,et al.  PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor , 2007, Nucleic Acids Res..

[25]  Yu Xue,et al.  PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory , 2006, BMC Bioinformatics.

[26]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[27]  Dariusz Plewczynski,et al.  AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update , 2008, Journal of molecular modeling.

[28]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M. Mann,et al.  PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites , 2007, Genome Biology.