Prediction of phosphorylation sites based on Krawtchouk image moments

Protein phosphorylation is one of the most pervasive post‐translational modifications and regulates diverse cellular processes in organisms. Under the catalysis of protein kinases, protein phosphorylation usually occurred in the residues serine (S), threonine (T), or tyrosine (Y). In this contribution, we proposed a novel scheme (named KMPhos) for the theoretical prediction of protein phosphorylation sites. First, the numerical matrix was obtained from a protein sequence fragment by replacing the characters of the residues with the chemical descriptors of amino acid molecules to approximately describe the chemical environment of the protein fragment, which was turned to the grayscale image. Then the Krawtchouk image moments were calculated and used to establish the support vector machine models. The accuracies of 10‐fold cross validation for the obtained models on the training set are up to 89.7%, 88.6%, and 90.1% for the residues S, Y, and T, respectively. For the independent test set, the prediction accuracies are up to 90.7% (S), 87.8% (T), and 89.3% (Y). The results of ROC and other evaluations are also satisfactory. Compared with several specialized prediction tools, KMPhos provided the higher accuracy and reliability. An available KMPhos package is provided and can be used directly for phosphorylation sites prediction.

[1]  W. M. Haynes CRC Handbook of Chemistry and Physics , 1990 .

[2]  Xiao Yun Zhang,et al.  An application of wavelet moments to the similarity analysis of three-dimensional fingerprint spectra obtained by high-performance liquid chromatography coupled with diode array detector. , 2014, Food chemistry.

[3]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[4]  A. Ardeshir Goshtasby,et al.  Template Matching in Rotated Images , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[6]  Michael F. Shlesinger,et al.  WAVELET TRANSFORMATION OF PROTEIN HYDROPHOBICITY SEQUENCES SUGGESTS THEIR MEMBERSHIPS IN STRUCTURAL FAMILIES , 1997 .

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Lulu Ning,et al.  In Silico Identification of Protein S-Palmitoylation Sites and Their Involvement in Human Inherited Disease , 2015, J. Chem. Inf. Model..

[9]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[10]  Hamid D. Ismail,et al.  RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest , 2016, BioMed research international.

[11]  M. Mann,et al.  PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites , 2007, Genome Biology.

[12]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[13]  G. Rose,et al.  Hydrogen bonding, hydrophobicity, packing, and protein folding. , 1993, Annual review of biophysics and biomolecular structure.

[14]  H. Zhai,et al.  A simple approach to quantitative analysis using three-dimensional spectra based on selected Zernike moments. , 2013, The Analyst.

[15]  K. Lu,et al.  Phosphorylation-specific prolyl isomerase Pin1 as a new diagnostic and therapeutic target for cancer. , 2008, Current cancer drug targets.

[16]  Lilia M. Iakoucheva,et al.  Loss of Post-Translational Modification Sites in Disease , 2010, Pacific Symposium on Biocomputing.

[17]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[18]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2008, Nucleic Acids Res..

[19]  Michel Schneider,et al.  The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. , 2009, Journal of proteomics.

[20]  H. Zhai,et al.  The analysis of core promoter sequences based on their chemical features , 2011 .

[21]  Joachim Selbig,et al.  PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor , 2007, Nucleic Acids Res..

[22]  Raveendran Paramesran,et al.  Image analysis by Krawtchouk moments , 2003, IEEE Trans. Image Process..

[23]  Nasser Kehtarnavaz,et al.  An affine invariant curve matching method for photo-identification of marine mammals , 2005, Pattern Recognit..

[24]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[25]  Stefani N. Thomas,et al.  PhosphoScan: a probability-based method for phosphorylation site prediction using MS2/MS3 pair information. , 2008, Journal of proteome research.

[26]  Prabin Kumar Bora,et al.  A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments , 2013, Pattern Recognit..

[27]  Sugata Ghosal,et al.  Orthogonal moment operators for subpixel edge detection , 1993, Pattern Recognit..

[28]  N. Blom,et al.  Identification of phosphorylation sites in protein kinase A substrates using artificial neural networks and mass spectrometry. , 2004, Journal of proteome research.

[29]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[30]  Hong Lin Zhai The prediction of promoter sequences based on the chemical features , 2011, Expert Syst. Appl..

[31]  T. Mäkelä,et al.  The tumor suppressor kinase LKB1: lessons from mouse models. , 2011, Journal of molecular cell biology.

[32]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[33]  Hong Lin Zhai,et al.  The application of a Tchebichef moment method to the quantitative analysis of multiple compounds based on three-dimensional HPLC fingerprint spectra. , 2015, The Analyst.

[34]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[35]  Sul‐Hee Chung Aberrant phosphorylation in the pathogenesis of Alzheimer's disease. , 2009, BMB reports.

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  J. Tanti,et al.  Cellular mechanisms of insulin resistance: role of stress-regulated serine kinases and insulin receptor substrates (IRS) serine phosphorylation. , 2009, Current opinion in pharmacology.

[38]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[39]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[40]  L. Pinna,et al.  How do protein kinases recognize their substrates? , 1996, Biochimica et biophysica acta.