Using deep neural networks and biological subwords to detect protein S-sulfenylation sites

Protein S-sulfenylation is one kind of crucial post-translational modifications (PTMs) in which the hydroxyl group covalently binds to the thiol of cysteine. Some recent studies have shown that this modification plays an important role in signaling transduction, transcriptional regulation and apoptosis. To date, the dynamic of sulfenic acids in proteins remains unclear because of its fleeting nature. Identifying S-sulfenylation sites, therefore, could be the key to decipher its mysterious structures and functions, which are important in cell biology and diseases. However, due to the lack of effective methods, scientists in this field tend to be limited in merely a handful of some wet lab techniques that are time-consuming and not cost-effective. Thus, this motivated us to develop an in silico model for detecting S-sulfenylation sites only from protein sequence information. In this study, protein sequences served as natural language sentences comprising biological subwords. The deep neural network was consequentially employed to perform classification. The performance statistics within the independent dataset including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the curve rates achieved 85.71%, 69.47%, 77.09%, 0.5554 and 0.833, respectively. Our results suggested that the proposed method (fastSulf-DNN) achieved excellent performance in predicting S-sulfenylation sites compared to other well-known tools on a benchmark dataset.

[1]  I. Rahman,et al.  Redox regulation of SIRT1 in inflammation and cellular senescence. , 2013, Free radical biology & medicine.

[2]  Lei Deng,et al.  PredCSO: an ensemble method for the prediction of S-sulfenylation sites in proteins. , 2018, Molecular omics.

[3]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[4]  Goedele Roos,et al.  Protein sulfenic acid formation: from cellular damage to redox regulation. , 2011, Free radical biology & medicine.

[5]  Tony T. Huang Deubiquitinases as a signaling target of oxidative stress , 2012, Cell reports.

[6]  Xing-Ming Zhao,et al.  DeepPhos: prediction of protein phosphorylation sites with deep learning , 2019, Bioinform..

[7]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[8]  Yu-Yen Ou,et al.  Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters. , 2019, Analytical biochemistry.

[9]  M. Bakhtiarizadeh,et al.  PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach , 2018, Scientific Reports.

[10]  Yu-Yen Ou,et al.  iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. , 2019, Analytical biochemistry.

[11]  Kate S. Carroll,et al.  The Redox Biochemistry of Protein Sulfenylation and Sulfinylation* , 2013, The Journal of Biological Chemistry.

[12]  N. Le iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou’s 5-step rule , 2019, Molecular Genetics and Genomics.

[13]  Yan Wang,et al.  Deep learning for mining protein data , 2019, Briefings Bioinform..

[14]  Jijun Tang,et al.  Predicting S-sulfenylation Sites Using Physicochemical Properties Differences , 2017 .

[15]  Fu-Ying Dao,et al.  A computational platform to identify origins of replication sites in eukaryotes , 2020, Briefings Bioinform..

[16]  Ling-Yun Wu,et al.  iSulf-Cys: Prediction of S-sulfenylation Sites in Proteins with Physicochemical Properties of Amino Acids , 2016, PloS one.

[17]  Dae-Yeul Yu,et al.  Inactivation of Peroxiredoxin I by Phosphorylation Allows Localized H2O2 Accumulation for Cell Signaling , 2010, Cell.

[18]  Tzong-Yi Lee,et al.  MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs , 2015, Bioinform..

[19]  Kate S Carroll,et al.  Sulfenic acid chemistry, detection and cellular lifetime. , 2014, Biochimica et biophysica acta.

[20]  Kate S Carroll,et al.  Peroxide-dependent sulfenylation of the EGFR catalytic site enhances kinase activity. , 2011, Nature chemical biology.

[21]  Zhe Ju,et al.  Prediction of S-sulfenylation sites using mRMR feature selection and fuzzy support vector machine algorithm. , 2018, Journal of theoretical biology.

[22]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[23]  Vidya Venkatraman,et al.  Cysteine oxidative posttranslational modifications: emerging regulation in the cardiovascular system. , 2013, Circulation research.

[24]  Petras J. Kundrotas,et al.  Natural language processing in text mining for structural modeling of protein complexes , 2018, BMC Bioinformatics.

[25]  Hiroyuki Kurata,et al.  Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. , 2017, Molecular bioSystems.

[26]  Aristidis Likas,et al.  PRESS: PRotEin S-Sulfenylation server , 2016, Bioinform..

[27]  Jing Yang,et al.  Global, in situ, site-specific analysis of protein S-sulfenylation , 2015, Nature Protocols.

[28]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[29]  Yan Xu,et al.  DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins , 2019, BMC Bioinformatics.

[30]  Duolin Wang,et al.  MusiteDeep: A deep-learning framework for protein post-translational modification site prediction , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[31]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[32]  Daniel C. Liebler,et al.  Site-specific mapping and quantification of protein S-sulfenylation in cells , 2014, Nature Communications.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Yan Xu,et al.  A deep learning method to more accurately recall known lysine acetylation sites , 2019, BMC Bioinformatics.

[35]  Leslie B. Poole,et al.  Introduction: What we do and do not know regarding redox processes of thiols in signaling pathways. , 2015, Free radical biology & medicine.

[36]  Direct cysteine sulfenylation drives activation of the Src kinase , 2018, Nature Communications.

[37]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[38]  Robert H. Newman,et al.  SVM-SulfoSite: A support vector machine based predictor for sulfenylation sites , 2018, Scientific Reports.

[39]  Hamid D. Ismail,et al.  RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest , 2016, BioMed research international.

[40]  L. Poole,et al.  Discovering mechanisms of signaling-mediated cysteine oxidation. , 2008, Current opinion in chemical biology.

[41]  Hebatallah A. Mohamed Hassan,et al.  Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique , 2015, Bioinformatics and biology insights.

[42]  Tzong-Yi Lee,et al.  SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites , 2016, BMC Genomics.

[43]  Cangzhi Jia,et al.  S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. , 2017, Journal of theoretical biology.

[44]  Yu Xue,et al.  DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning , 2018, Genom. Proteom. Bioinform..

[45]  J. Helmann,et al.  Thiol-based redox switches and gene regulation. , 2011, Antioxidants & redox signaling.

[46]  Fei Guo,et al.  EP3: an ensemble predictor that accurately identifies type III secreted effectors. , 2020, Briefings in bioinformatics.

[47]  Jiangning Song,et al.  SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. , 2016, Molecular bioSystems.

[48]  Nguyen Quoc Khanh Le,et al.  Classifying Promoters by Interpreting the Hidden Information of DNA Sequences via Deep Learning and Combination of Continuous FastText N-Grams , 2019, Front. Bioeng. Biotechnol..

[49]  Nguyen Quoc Khanh Le,et al.  Fertility-GRU: Identifying fertility-related proteins by incorporating deep gated recurrent units and original PSSM profiles. , 2019, Journal of proteome research.

[50]  David Komander,et al.  Regulation of A20 and other OTU deubiquitinases by reversible oxidation , 2013, Nature Communications.