Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method

Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.

[1]  Hsien-Da Huang,et al.  Incorporating Evolutionary Information and Functional Domains for Identifying RNA Splicing Factors in Humans , 2011, PloS one.

[2]  Tzong-Yi Lee,et al.  MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs , 2017, BMC Systems Biology.

[3]  Jinyan Li,et al.  Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs , 2015, PloS one.

[4]  Tzong-Yi Lee,et al.  SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites , 2016, BMC Genomics.

[5]  Shao-Ping Shi,et al.  SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy , 2015, Bioinform..

[6]  Hsien-Da Huang,et al.  N‐Ace: Using solvent accessibility and physicochemical properties to identify protein N‐acetylation sites , 2010, J. Comput. Chem..

[7]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[8]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[9]  Hiroyuki Kurata,et al.  GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features , 2018, PloS one.

[10]  T. Tsunoda,et al.  PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. , 2017, Journal of theoretical biology.

[11]  Tzong-Yi Lee,et al.  MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs , 2015, Bioinform..

[12]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..

[13]  Cathy H. Wu,et al.  iPTMnet: an integrated resource for protein post-translational modification network discovery , 2017, Nucleic Acids Res..

[14]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  J. Boeke,et al.  Lysine Succinylation and Lysine Malonylation in Histones* , 2012, Molecular & Cellular Proteomics.

[17]  Huizhong Wang,et al.  Succinyl-proteome profiling of a high taxol containing hybrid Taxus species (Taxus × media) revealed involvement of succinylation in multiple metabolic pathways , 2016, Scientific Reports.

[18]  Tzong-Yi Lee,et al.  An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins , 2014, BioMed research international.

[19]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[20]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[21]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[22]  Ling-Yun Wu,et al.  iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity , 2015, Scientific Reports.

[23]  Tzong-Yi Lee,et al.  Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features , 2017, BMC Bioinformatics.

[24]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[25]  Jorng-Tzong Horng,et al.  Incorporating support vector machine for identifying protein tyrosine sulfation sites , 2009, J. Comput. Chem..

[26]  Zhihong Zhang,et al.  Identification of lysine succinylation as a new post-translational modification. , 2011, Nature chemical biology.

[27]  Kwang Kim,et al.  Proteome-wide identification of lysine succinylation in thermophilic and mesophilic bacteria. , 2017, Biochimica et biophysica acta. Proteins and proteomics.

[28]  Wei Liu,et al.  First succinyl-proteome profiling of extensively drug-resistant Mycobacterium tuberculosis revealed involvement of succinylation in cellular physiology. , 2015, Journal of proteome research.

[29]  Hsien-Da Huang,et al.  dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications , 2018, Nucleic Acids Res..

[30]  Tzong-Yi Lee,et al.  Identification and characterization of lysine-methylated sites on histones and non-histone proteins , 2014, Comput. Biol. Chem..

[31]  Gisbert Schneider,et al.  Support vector machine applications in bioinformatics. , 2003, Applied bioinformatics.

[32]  Hsien-Da Huang,et al.  dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins , 2015, Nucleic Acids Res..

[33]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[34]  P. Bénit,et al.  Unsuspected task for an old team: succinate, fumarate and other Krebs cycle acids in metabolic remodeling. , 2014, Biochimica et biophysica acta.

[35]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[36]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[37]  Yu-Ju Chen,et al.  dbSNO 2.0: a resource for exploring structural environment, functional and disease association and regulatory network of protein S-nitrosylation , 2014, Nucleic Acids Res..

[38]  T. Tsunoda,et al.  Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction , 2018, BMC Genomics.

[39]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[40]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[41]  Tzong-Yi Lee,et al.  Identifying Protein Phosphorylation Sites with Kinase Substrate Specificity on Human Viruses , 2012, PloS one.

[42]  Hiroto Saigo,et al.  CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes , 2017, BMC Bioinformatics.

[43]  Abdollah Dehzangi,et al.  Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams , 2018, PloS one.

[44]  Yu Xue,et al.  DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning , 2018, Genom. Proteom. Bioinform..

[45]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[46]  Yingming Zhao,et al.  SIRT5-mediated lysine desuccinylation impacts diverse metabolic pathways. , 2013, Molecular cell.

[47]  Md. Nurul Haque Mollah,et al.  SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. , 2016, Molecular bioSystems.

[48]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[49]  S. Singer,et al.  Succinylation of Gamma Globulin , 1966, Nature.

[50]  Zhiqiang Ma,et al.  Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique. , 2015, Journal of theoretical biology.

[51]  Y. Liu,et al.  Quantitative proteome and lysine succinylome analyses provide insights into metabolic regulation in breast cancer , 2018, Breast Cancer.

[52]  Tzong-Yi Lee,et al.  Incorporating Amino Acids Composition and Functional Domains for Identifying Bacterial Toxin Proteins , 2014, BioMed research international.

[53]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[54]  Dexing Zhong,et al.  CarSPred: A Computational Tool for Predicting Carbonylation Sites of Human Proteins , 2014, PloS one.

[55]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[56]  Dianjing Guo,et al.  A systematic identification of species-specific protein succinylation sites using joint element features information , 2017, International journal of nanomedicine.

[57]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[58]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[59]  Hiroyuki Kurata,et al.  Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites , 2019, Cells.

[60]  Tzong-Yi Lee,et al.  Investigation and identification of protein γ-glutamyl carboxylation sites , 2011, BMC Bioinformatics.

[61]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[62]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[63]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[64]  M. Mann,et al.  Mass spectrometry–based proteomics turns quantitative , 2005, Nature chemical biology.

[65]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[66]  Tzong-Yi Lee,et al.  GSHSite: Exploiting an Iteratively Statistical Method to Identify S-Glutathionylation Sites with Substrate Specificity , 2015, PloS one.

[67]  Tzong-Yi Lee,et al.  MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition , 2017, PloS one.