Protein Ubiquitylation and Sumoylation Site Prediction Based on Ensemble and Transfer Learning

Ubiquitylation, a typical post-translational modification (PTM), plays an important role in signal transduction, apoptosis and cell proliferation. A ubiquitylation like PTM, sumoylation also may affect gene mapping, expression and genomic replication. Over the past two decades, machine learning has been widely employed in protein ubiquitylation and sumoylation site prediction tools. These existing tools require feature engineering, but failed to provide general interpretable features and probably underutilized the growing amount of data. This prompted us to propose a deep learning-based model that integrates multiple convolution and fully-connected layers of seven supervised learning sub-models to extract deep representations from protein sequences and physico-chemical properties (PCPs). Especially, we divided PCPs into 6 clusters and customized deep networks accordingly for handling the high correlations among one cluster. A stacking ensemble strategy was applied to combine these deep representations to make prediction. Furthermore, with the advantage of transfer learning, our deep learning model can work well on protein sumoylation site prediction as well after fine-tuning. On the high-quality annotated database Swiss-Prot, our model outperformed several well-known ubiquitylation and sumoylation site prediction tools. Our code is freely available at https://github.com/ruiwcoding/DeepUbiSumoPre.

[1]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[2]  F. Melchior,et al.  Concepts in sumoylation: a decade on , 2007, Nature Reviews Molecular Cell Biology.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  G. Gill,et al.  SUMO and ubiquitin in the nucleus: different functions, similar mechanisms? , 2004, Genes & development.

[5]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[6]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[7]  Tzong-Yi Lee,et al.  UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines , 2016, BMC Systems Biology.

[8]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[9]  Shinn-Ying Ho,et al.  ESA‐UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives , 2017, Bioinform..

[10]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[11]  Yanchun Liang,et al.  Capsule network for protein post-translational modification site prediction , 2018, Bioinform..

[12]  Dong Xu,et al.  Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture , 2018, BMC Systems Biology.

[13]  Qi Zhao,et al.  GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs , 2014, Nucleic Acids Res..

[14]  R. Mayer,et al.  Ubiquitin and ubiquitin-like proteins as multifunctional signals , 2005, Nature Reviews Molecular Cell Biology.

[15]  Kuo-Chen Chou,et al.  pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC , 2016, Bioinform..

[16]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[17]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[18]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[19]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[20]  Keith D Wilkinson,et al.  The discovery of ubiquitin-dependent proteolysis , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Steven P Gygi,et al.  A proteomics approach to understanding protein ubiquitination , 2003, Nature Biotechnology.

[22]  G Goldstein,et al.  Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[24]  Yan Xu,et al.  DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins , 2019, BMC Bioinformatics.

[25]  Linda Hicke,et al.  Ubiquitin-binding domains , 2005, Nature Reviews Molecular Cell Biology.

[26]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[27]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jean-François Zagury,et al.  JASSA: a comprehensive tool for prediction of SUMOylation sites and SIMs , 2015, Bioinform..

[29]  C. Pickart,et al.  Ubiquitin: structures, functions, mechanisms. , 2004, Biochimica et biophysica acta.

[30]  A. Dejean,et al.  Nuclear and unclear functions of SUMO , 2003, Nature Reviews Molecular Cell Biology.