MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs

BackgroundCarbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson’s disease, and Alzheimer’s disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures.ResultsBy manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing.ConclusionThis study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.

[1]  Hsien-Da Huang,et al.  Incorporating Evolutionary Information and Functional Domains for Identifying RNA Splicing Factors in Humans , 2011, PloS one.

[2]  F. Regnier,et al.  Profiling carbonylated proteins in human plasma. , 2010, Journal of proteome research.

[3]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[4]  Philippe Gillery,et al.  Evaluation of nonenzymatic posttranslational modification-derived products as biomarkers of molecular aging of proteins. , 2010, Clinical chemistry.

[5]  Hamid Mirzaei,et al.  Creation of allotypic active sites during oxidative stress. , 2006, Journal of proteome research.

[6]  Hsien-Da Huang,et al.  dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins , 2015, Nucleic Acids Res..

[7]  Bandana Kumari,et al.  PalmPred: An SVM Based Palmitoylation Prediction Method Using Sequence Profile Information , 2014, PloS one.

[8]  D. Lowy,et al.  New clue to Ras lipid glue , 1989, Nature.

[9]  I. Miller,et al.  Detecting oxidative post-translational modifications in proteins , 2007, Amino Acids.

[10]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[11]  Roberto Colombo,et al.  Protein carbonyl groups as biomarkers of oxidative stress. , 2003, Clinica chimica acta; international journal of clinical chemistry.

[12]  D R Lowy,et al.  Protein modification: new clue to Ras lipid glue. , 1989, Nature.

[13]  Shlomi Reuveni,et al.  Coexistence of Flexibility and Stability of Proteins: An Equation of State , 2009, PloS one.

[14]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[15]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[16]  Coral Barbas,et al.  Identification of oxidized proteins in rat plasma using avidin chromatography and tandem mass spectrometry , 2008, Proteomics.

[17]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[18]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[19]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[20]  Hamid Mirzaei,et al.  Enrichment of carbonylated peptides using Girard P reagent and strong cation exchange chromatography. , 2006, Analytical chemistry.

[21]  Hamid Mirzaei,et al.  Identification and quantification of protein carbonylation using light and heavy isotope labeled Girard's P reagent. , 2006, Journal of chromatography. A.

[22]  Laszlo Prokai,et al.  Mass spectrometry-based survey of age-associated protein carbonylation in rat brain mitochondria. , 2007, Journal of mass spectrometry : JMS.

[23]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[24]  Jorng-Tzong Horng,et al.  Incorporating support vector machine for identifying protein tyrosine sulfation sites , 2009, J. Comput. Chem..

[25]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[26]  Tzong-Yi Lee,et al.  Identifying Protein Phosphorylation Sites with Kinase Substrate Specificity on Human Viruses , 2012, PloS one.

[27]  Thomas Nyström,et al.  Role of oxidative carbonylation in protein quality control and senescence , 2005, The EMBO journal.

[28]  Jiri Adamec,et al.  Dynamics of protein damage in yeast frataxin mutant exposed to oxidative stress. , 2010, Omics : a journal of integrative biology.

[29]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[30]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Tzong-Yi Lee,et al.  ViralPhos: incorporating a recursively statistical method to predict phosphorylation sites on virus proteins , 2013, BMC Bioinformatics.

[32]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[33]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[34]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[35]  T. Cotter,et al.  Carbonylation of glycolytic proteins is a key response to drug-induced oxidative stress and apoptosis , 2004, Cell Death and Differentiation.

[36]  Jinyan Li,et al.  predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue. , 2017, Analytical biochemistry.

[37]  Tzong-Yi Lee,et al.  SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites , 2016, BMC Genomics.

[38]  Sonia Longhi,et al.  Rules Governing Selective Protein Carbonylation , 2009, PloS one.

[39]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[40]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[41]  Hamid Mirzaei,et al.  Identification of yeast oxidized proteins: chromatographic top-down approach for identification of carbonylated, fragmented and cross-linked proteins in yeast. , 2007, Journal of chromatography. A.

[42]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[43]  Dexing Zhong,et al.  CarSPred: A Computational Tool for Predicting Carbonylation Sites of Human Proteins , 2014, PloS one.

[44]  Tzong-Yi Lee,et al.  Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features , 2017, BMC Bioinformatics.

[45]  Hamid Mirzaei,et al.  Affinity chromatographic selection of carbonylated proteins followed by identification of oxidation sites using tandem mass spectrometry. , 2005, Analytical chemistry.

[46]  Ian Max Møller,et al.  Pattern of occurrence and occupancy of carbonylation sites in proteins , 2011, Proteomics.

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  F. Regnier,et al.  Proteomic identification of carbonylated proteins and their oxidation sites. , 2010, Journal of proteome research.

[49]  K. Chou,et al.  iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC , 2016, Oncotarget.

[50]  Tzong-Yi Lee,et al.  UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines , 2016, BMC Systems Biology.

[51]  Tzong-Yi Lee,et al.  GSHSite: Exploiting an Iteratively Statistical Method to Identify S-Glutathionylation Sites with Substrate Specificity , 2015, PloS one.

[52]  Tzong-Yi Lee,et al.  MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition , 2017, PloS one.

[53]  Tzong-Yi Lee,et al.  A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs , 2015, BMC Bioinformatics.

[54]  Gennaro Marino,et al.  Dansyl labeling and bidimensional mass spectrometry to investigate protein carbonylation. , 2011, Rapid communications in mass spectrometry : RCM.

[55]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[56]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[57]  Roberto Colombo,et al.  Protein carbonylation in human diseases. , 2003, Trends in molecular medicine.

[58]  Tzong-Yi Lee,et al.  MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs , 2015, Bioinform..

[59]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..