DeepPhos: prediction of protein phosphorylation sites with deep learning

Abstract Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Jin Jin Liu,et al.  Prediction of phosphorylation sites based on Krawtchouk image moments , 2017, Proteins.

[2]  Li-na Wang,et al.  Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization , 2016, Bioinform..

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[5]  Zexian Liu,et al.  Systematic analysis of the in situ crosstalk of tyrosine modifications reveals no additional natural selection on multiply modified residues , 2014, Scientific Reports.

[6]  H R Matthews,et al.  Protein kinases and phosphatases that act on histidine, lysine, or arginine residues in eukaryotic proteins: a possible regulator of the mitogen-activated protein kinase cascade. , 1995, Pharmacology & therapeutics.

[7]  Anthony J. Kusalik,et al.  Computational prediction of eukaryotic phosphorylation sites , 2011, Bioinform..

[8]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[9]  Bo Yao,et al.  Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties. , 2017, Methods in molecular biology.

[10]  Xuegong Zhang,et al.  Prediction of kinase‐specific phosphorylation sites with sequence features by a log‐odds ratio approach , 2007, Proteins.

[11]  Yu Liu,et al.  PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile , 2018, International journal of biological sciences.

[12]  Yi Shen,et al.  Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest , 2014, Amino Acids.

[13]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[14]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[15]  Christopher T. Walsh,et al.  Posttranslational Modification of Proteins: Expanding Nature's Inventory , 2005 .

[16]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[20]  Xing-Ming Zhao,et al.  PhosD: inferring kinase‐substrate interactions based on protein domains , 2016, Bioinform..

[21]  Yu Xue,et al.  PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory , 2006, BMC Bioinformatics.

[22]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[23]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[24]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[25]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[26]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[29]  Geoffrey I. Webb,et al.  PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection , 2017, Scientific Reports.

[30]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[31]  Florian Gnad,et al.  Predicting post-translational lysine acetylation using support vector machines , 2010, Bioinform..

[32]  Dongdong Sun,et al.  Prognosis prediction of human breast cancer by integrating deep neural network and support vector machine: Supervised feature extraction and classification for breast cancer prognosis prediction , 2017, 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[33]  Robert A Harris,et al.  32P labeling of protein phosphorylation and metabolite association in the mitochondria matrix. , 2009, Methods in enzymology.

[34]  Geoffrey I. Webb,et al.  GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome , 2015, Bioinform..

[35]  Geoffrey I. Webb,et al.  Large-scale comparative assessment of computational predictors for lysine post-translational modification sites , 2018, Briefings Bioinform..

[36]  Shao-Ping Shi,et al.  Using support vector machines to identify protein phosphorylation sites in viruses. , 2015, Journal of molecular graphics & modelling.

[37]  Anthony Kusalik,et al.  DAPPLE 2: a Tool for the Homology-Based Prediction of Post-Translational Modification Sites. , 2016, Journal of proteome research.

[38]  Licheng Yu,et al.  MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Anthony J. Kusalik,et al.  DAPPLE: a pipeline for the homology-based prediction of phosphorylation sites , 2013, Bioinform..

[40]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[41]  Yu Xue,et al.  DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning , 2018, Genom. Proteom. Bioinform..

[42]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[43]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[44]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Nikolaj Blom,et al.  Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins , 2004, BMC Bioinformatics.

[46]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[47]  Andrzej Kloczkowski,et al.  Prediction of Protein Secondary Structure , 2017, Methods in Molecular Biology.

[48]  Raghvendra Mall,et al.  DeepSol: a deep learning framework for sequence‐based protein solubility prediction , 2018, Bioinform..

[49]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[50]  Jiangning Song,et al.  PhosContext2vec: a distributed representation of residue-level sequence contexts and its application to general and kinase-specific phosphorylation site prediction , 2018, Scientific Reports.

[51]  Yixue Li,et al.  SysPTM: A Systematic Resource for Proteomic Research on Post-translational Modifications* , 2009, Molecular & Cellular Proteomics.

[52]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[53]  Shinn-Ying Ho,et al.  ESA‐UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives , 2017, Bioinform..

[54]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.