A New Scheme to Characterize and Identify Protein Ubiquitination Sites

Protein ubiquitination, involving the conjugation of ubiquitin on lysine residue, serves as an important modulator of many cellular functions in eukaryotes. Recent advancements in proteomic technology have stimulated increasing interest in identifying ubiquitination sites. However, most computational tools for predicting ubiquitination sites are focused on small-scale data. With an increasing number of experimentally verified ubiquitination sites, we were motivated to design a predictive model for identifying lysine ubiquitination sites for large-scale proteome dataset. This work assessed not only single features, such as amino acid composition (AAC), amino acid pair composition (AAPC) and evolutionary information, but also the effectiveness of incorporating two or more features into a hybrid approach to model construction. The support vector machine (SVM) was applied to generate the prediction models for ubiquitination site identification. Evaluation by five-fold cross-validation showed that the SVM models learned from the combination of hybrid features delivered a better prediction performance. Additionally, a motif discovery tool, MDDLogo, was adopted to characterize the potential substrate motifs of ubiquitination sites. The SVM models integrating the MDDLogo-identified substrate motifs could yield an average accuracy of 68.70 percent. Furthermore, the independent testing result showed that the MDDLogo-clustered SVM models could provide a promising accuracy (78.50 percent) and perform better than other prediction tools. Two cases have demonstrated the effective prediction of ubiquitination sites with corresponding substrate motifs.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  K. Robert Lai,et al.  Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities , 2015, BMC Bioinformatics.

[3]  M. Hochstrasser,et al.  Origin and function of ubiquitin-like proteins , 2009, Nature.

[4]  Tao Huang,et al.  Using WPNNA classifier in ubiquitination site prediction based on hybrid features. , 2013, Protein and peptide letters.

[5]  M. Rapé,et al.  The Ubiquitin Code , 2012, Annual review of biochemistry.

[6]  Sebastian A. Wagner,et al.  A Proteome-wide, Quantitative Survey of In Vivo Ubiquitylation Sites Reveals Widespread Regulatory Roles* , 2011, Molecular & Cellular Proteomics.

[7]  Edward L. Huttlin,et al.  Systematic and quantitative assessment of the ubiquitin-modified proteome. , 2011, Molecular cell.

[8]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[9]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[10]  Tao Zhou,et al.  mUbiSiDa: A Comprehensive Database for Protein Ubiquitination Sites in Mammals , 2014, PloS one.

[11]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[12]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[13]  Samie R Jaffrey,et al.  Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling , 2010, Nature Biotechnology.

[14]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..

[15]  G Goldstein,et al.  Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[17]  R. Mayer,et al.  Ubiquitin and ubiquitin-like proteins as multifunctional signals , 2005, Nature Reviews Molecular Cell Biology.

[18]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[19]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[20]  Hsien-Da Huang,et al.  dbSNO: a database of cysteine S-nitrosylation , 2012, Bioinform..

[21]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[22]  David Klenerman,et al.  Ubiquitin chain conformation regulates recognition and activity of interacting proteins , 2012, Nature.

[23]  Keiichi I Nakayama,et al.  Proteome-wide identification of ubiquitylation sites by conjugation of engineered lysine-less ubiquitin. , 2012, Journal of proteome research.

[24]  Hsien-Da Huang,et al.  N‐Ace: Using solvent accessibility and physicochemical properties to identify protein N‐acetylation sites , 2010, J. Comput. Chem..

[25]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[26]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[27]  M. Petroski,et al.  The ubiquitin system, disease, and drug discovery , 2008, BMC Biochemistry.

[28]  Tzong-Yi Lee,et al.  Identifying Protein Phosphorylation Sites with Kinase Substrate Specificity on Human Viruses , 2012, PloS one.

[29]  Jiangning Song,et al.  Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features , 2015, Briefings Bioinform..

[30]  Hsien-Da Huang,et al.  Incorporating Evolutionary Information and Functional Domains for Identifying RNA Splicing Factors in Humans , 2011, PloS one.

[31]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[32]  Keith D Wilkinson,et al.  The discovery of ubiquitin-dependent proteolysis , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Steven P Gygi,et al.  A proteomics approach to understanding protein ubiquitination , 2003, Nature Biotechnology.

[34]  P. Robinson,et al.  Ubiquitin-protein ligases , 2004, Journal of Cell Science.

[35]  Tao Huang,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2011, Amino Acids.

[36]  Nobuhiro Nakamura,et al.  Ubiquitin System , 2018, International journal of molecular sciences.

[37]  C. Pickart,et al.  Ubiquitin: structures, functions, mechanisms. , 2004, Biochimica et biophysica acta.

[38]  Linda Hicke,et al.  Ubiquitin-binding domains , 2005, Nature Reviews Molecular Cell Biology.

[39]  A. Seth,et al.  The ubiquitin-mediated protein degradation pathway in cancer: therapeutic implications. , 2004, European journal of cancer.

[40]  Min Gao,et al.  Regulating the regulators: control of protein ubiquitination and ubiquitin-like modifications by extracellular stimuli. , 2005, Molecular cell.

[41]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[42]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[43]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[44]  P. Lehner,et al.  A novel post-transcriptional role for ubiquitin in the differential regulation of MHC class I allotypes☆ , 2013, Molecular immunology.

[45]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[46]  Ken Shirasu,et al.  Role of ubiquitination in the regulation of plant defence against pathogens. , 2003, Current opinion in plant biology.

[47]  David J Studholme,et al.  Multidimensional Protein Identification Technology (MudPIT) Analysis of Ubiquitinated Proteins in Plants*S , 2007, Molecular & Cellular Proteomics.

[48]  Yoonsoo Hahn,et al.  Gains of ubiquitylation sites in highly conserved proteins in the human lineage , 2012, BMC Bioinformatics.

[49]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[50]  Yu Xue,et al.  UUCD: a family-based database of ubiquitin and ubiquitin-like conjugation , 2012, Nucleic Acids Res..