MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction

Motivation: Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting‐edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. Results: We present MusiteDeep, the first deep‐learning framework for predicting general and kinase‐specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two‐dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision‐recall curve in general phosphorylation site prediction and obtains competitive results in kinase‐specific prediction compared to other well‐known tools on the benchmark data. Availability and implementation: MusiteDeep is provided as an open‐source tool available at https://github.com/duolinwang/MusiteDeep. Contact: xudong@missouri.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Yu Xue,et al.  GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. , 2011, Protein engineering, design & selection : PEDS.

[2]  S. Shenolikar,et al.  Overview of Protein Phosphorylation , 1995, Current protocols in protein science.

[3]  SchmidhuberJürgen,et al.  2005 Special Issue , 2005 .

[4]  Subhadip Basu,et al.  AMS 4.0: consensus prediction of post-translational modifications in protein sequences , 2012, Amino Acids.

[5]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[6]  P. Cohen,et al.  On target with a new mechanism for the regulation of protein phosphorylation. , 1993, Trends in biochemical sciences.

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Florian Gnad,et al.  PHOSIDA 2011: the posttranslational modification database , 2010, Nucleic Acids Res..

[9]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[12]  Dong Xu,et al.  Predicting and Analyzing Protein Phosphorylation Sites in Plants Using Musite , 2012, Front. Plant Sci..

[13]  Ole Winther,et al.  Convolutional LSTM Networks for Subcellular Localization of Proteins , 2015, AlCoB.

[14]  Patricia T W Cohen,et al.  Protein phosphatase 1--targeted in many directions. , 2002, Journal of cell science.

[15]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[16]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[17]  T. Pawson,et al.  Signaling through scaffold, anchoring, and adaptor proteins. , 1997, Science.

[18]  Hanno Steen,et al.  Post‐translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding , 2012, Wiley interdisciplinary reviews. Systems biology and medicine.

[19]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[20]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[21]  Qi Zhao,et al.  GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs , 2014, Nucleic Acids Res..

[22]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2008, Nucleic Acids Res..

[23]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[24]  Zexian Liu,et al.  GPS-YNO2: computational prediction of tyrosine nitration sites in proteins. , 2011, Molecular bioSystems.

[25]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[26]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[27]  Yixue Li,et al.  SysPTM: A Systematic Resource for Proteomic Research on Post-translational Modifications* , 2009, Molecular & Cellular Proteomics.

[28]  Yigong Shi Serine/Threonine Phosphatases: Mechanism through Structure , 2009, Cell.

[29]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[30]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[31]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[32]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[33]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[34]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[35]  S. Shenolikar,et al.  Overview of protein phosphorylation. , 2001, Current protocols in molecular biology.

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[38]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[39]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Predrag Radivojac,et al.  The structural and functional signatures of proteins that undergo multiple events of post‐translational modification , 2014, Protein science : a publication of the Protein Society.

[41]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[42]  Hsien-Da Huang,et al.  RegPhos: a system to explore the protein kinase–substrate phosphorylation network in humans , 2010, Nucleic Acids Res..

[43]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[44]  Qiuming Yao,et al.  Phosphorylation site prediction in plants. , 2015, Methods in molecular biology.