PhosTransfer: A Deep Transfer Learning Framework for Kinase-Specific Phosphorylation Site Prediction in Hierarchy

Machine learning algorithms have been widely used for predicting kinase-specific phosphorylation sites. However, the scarcity of training data for specific kinases makes it difficult to train effective models for predicting their phosphorylation sites. In this paper, we propose a deep transfer learning framework, PhosTransfer, for improving kinase-specific phosphorylation site prediction. It banks on the hierarchical information encoded in the kinase classification tree (KCT) which involves four levels: kinase groups, families, subfamilies and protein kinases (PKs). With PhosTransfer, predictive models associated with tree nodes at higher levels, which are trained with more sufficient training data, can be transferred and reused as feature extractors for predictive models of tree nodes at a lower level. Out results indicate that models with deep transfer learning out-performed those without transfer learning for 73 out of 79 tested PKs. The positive effect of deep transfer learning is better demonstrated in the prediction of phosphosites for kinase nodes with less training data. These improved performances are further validated and explained by the visualisation of vector representations generated from hidden layers pre-trained at different KCT levels. Electronic supplementary material The online version of this chapter (10.1007/978-3-030-47436-2_29) contains supplementary material, which is available to authorized users.

[1]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[2]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[3]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[4]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[5]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[6]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[7]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[8]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[9]  Yu Xue,et al.  PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory , 2006, BMC Bioinformatics.

[10]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[11]  Yu Xue,et al.  GPS: a novel group-based phosphorylation predicting and scoring method. , 2004, Biochemical and biophysical research communications.

[12]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[13]  Yi Shen,et al.  Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest , 2014, Amino Acids.

[14]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[15]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[16]  Mikael Bodén,et al.  PhosphoPICK: modelling cellular context to map kinase-substrate phosphorylation events , 2015, Bioinform..

[17]  Yu Xue,et al.  GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. , 2011, Protein engineering, design & selection : PEDS.

[18]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[19]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[20]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[21]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[22]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..