Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities

BackgroundIn eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities.ResultsIn this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40% sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07%, 65.46%, and 67.93%, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13%) and outperform other ubiquitination site prediction tool.ConclusionA case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.

[1]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[2]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[3]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[4]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[5]  Tzong-Yi Lee,et al.  Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures , 2013, BMC Bioinformatics.

[6]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[7]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[8]  C. Pickart,et al.  Ubiquitin: structures, functions, mechanisms. , 2004, Biochimica et biophysica acta.

[9]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[10]  R. Mayer,et al.  Ubiquitin and ubiquitin-like proteins as multifunctional signals , 2005, Nature Reviews Molecular Cell Biology.

[11]  Tao Huang,et al.  Using WPNNA classifier in ubiquitination site prediction based on hybrid features. , 2013, Protein and peptide letters.

[12]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..

[13]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[14]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[15]  Tao Huang,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2011, Amino Acids.

[16]  David J Studholme,et al.  Multidimensional Protein Identification Technology (MudPIT) Analysis of Ubiquitinated Proteins in Plants*S , 2007, Molecular & Cellular Proteomics.

[17]  Yu-Ju Chen,et al.  dbGSH: a database of S-glutathionylation , 2014, Bioinform..

[18]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[19]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[20]  Tao Zhou,et al.  mUbiSiDa: A Comprehensive Database for Protein Ubiquitination Sites in Mammals , 2014, PloS one.

[21]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[22]  Linda Hicke,et al.  Ubiquitin-binding domains , 2005, Nature Reviews Molecular Cell Biology.

[23]  A. Seth,et al.  The ubiquitin-mediated protein degradation pathway in cancer: therapeutic implications. , 2004, European journal of cancer.

[24]  Alejandro Garcia,et al.  UbiProt: a database of ubiquitylated proteins , 2007, BMC Bioinformatics.

[25]  Hsien-Da Huang,et al.  RegPhos 2.0: an updated resource to explore protein kinase–substrate phosphorylation networks in mammals , 2014, Database J. Biol. Databases Curation.

[26]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[27]  Jeffrey J. P. Tsai,et al.  Transcription factor and microRNA-regulated network motifs for cancer and signal transduction networks , 2015, BMC Systems Biology.

[28]  R. G. Kulka,et al.  Degradation signals for ubiquitin system proteolysis in Saccharomyces cerevisiae , 1998, The EMBO journal.

[29]  Tzong-Yi Lee,et al.  topPTM: a new module of dbPTM for identifying functional post-translational modifications in transmembrane proteins , 2013, Nucleic Acids Res..

[30]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[31]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[32]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[33]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[34]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[35]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[36]  Hsien-Da Huang,et al.  dbSNO: a database of cysteine S-nitrosylation , 2012, Bioinform..