MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs

UNLABELLED S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs. AVAILABILITY AND IMPLEMENTATION The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/. All of the data set used in this work is also available for download in the website. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT francis@saturn.yzu.edu.tw.

[1]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[2]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[3]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  P Tufféry,et al.  Predicting the disulfide bonding state of cysteines using protein descriptors , 2002, Proteins.

[6]  András Fiser,et al.  Servers for sequence-structure relationship analysis and prediction , 2003, Nucleic Acids Res..

[7]  Shandar Ahmad,et al.  RVP-net: online prediction of real valued accessible surface area of proteins from single sequences , 2003, Bioinform..

[8]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[9]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[10]  Jorng-Tzong Horng,et al.  Incorporating hidden Markov models for identifying protein kinase‐specific phosphorylation sites , 2005, J. Comput. Chem..

[11]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[12]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[13]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[14]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[15]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[16]  L. Poole,et al.  Discovering mechanisms of signaling-mediated cysteine oxidation. , 2008, Current opinion in chemical biology.

[17]  Jorng-Tzong Horng,et al.  Incorporating support vector machine for identifying protein tyrosine sulfation sites , 2009, J. Comput. Chem..

[18]  Daniela C Dieterich,et al.  Cleavable biotin probes for labeling of biomolecules via azide-alkyne cycloaddition. , 2010, Journal of the American Chemical Society.

[19]  David Baker,et al.  Quantitative reactivity profiling predicts functional cysteines in proteomes , 2010, Nature.

[20]  Kate S. Carroll,et al.  Quantification of protein sulfenic acid modifications using isotope-coded dimedone and iododimedone. , 2011, Angewandte Chemie.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Eunok Paek,et al.  Isoform-specific regulation of Akt by PDGF-induced reactive oxygen species , 2011, Proceedings of the National Academy of Sciences.

[23]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[24]  V. Gladyshev,et al.  Analysis and Functional Prediction of Reactive Cysteine Residues* , 2011, The Journal of Biological Chemistry.

[25]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..

[26]  Goedele Roos,et al.  Protein sulfenic acid formation: from cellular damage to redox regulation. , 2011, Free radical biology & medicine.

[27]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[28]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[29]  Kate S Carroll,et al.  Chemical 'omics' approaches for understanding protein cysteine oxidation in biology. , 2011, Current opinion in chemical biology.

[30]  Wei Ge,et al.  RedoxDB - a curated database for experimentally verified protein oxidative modification , 2012, Bioinform..

[31]  Hsien-Da Huang,et al.  dbSNO: a database of cysteine S-nitrosylation , 2012, Bioinform..

[32]  Tzong-Yi Lee,et al.  Identifying Protein Phosphorylation Sites with Kinase Substrate Specificity on Human Viruses , 2012, PloS one.

[33]  Peng Wu,et al.  Single-stranded DNA as a cleavable linker for bioorthogonal click chemistry-based proteomics. , 2013, Bioconjugate chemistry.

[34]  E. Weerapana,et al.  An Isotopically Tagged Azobenzene‐Based Cleavable Linker for Quantitative Proteomics , 2013, Chembiochem : a European journal of chemical biology.

[35]  Daniel C. Liebler,et al.  Site-specific mapping and quantification of protein S-sulfenylation in cells , 2014, Nature Communications.

[36]  Yu-Ju Chen,et al.  dbGSH: a database of S-glutathionylation , 2014, Bioinform..

[37]  Cristina M Furdui,et al.  Chemical approaches to detect and analyze protein sulfenic acids. , 2014, Mass spectrometry reviews.

[38]  Benjamin F. Cravatt,et al.  A chemoproteomic platform to quantitatively map targets of lipid-derived electrophiles , 2013, Nature Methods.