Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features

Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew’s Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.

[1]  Cangzhi Jia,et al.  S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. , 2017, Journal of theoretical biology.

[2]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[3]  Abdul Waheed,et al.  Disease-Linked Glutarylation Impairs Function and Interactions of Mitochondrial Proteins and Contributes to Mitochondrial Heterogeneity. , 2018, Cell reports.

[4]  S. Ranganathan,et al.  PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids , 2018, Scientific Reports.

[5]  Yan Xu,et al.  A deep learning method to more accurately recall known lysine acetylation sites , 2019, BMC Bioinformatics.

[6]  Zhe Ju,et al.  Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC. , 2017, Journal of molecular graphics & modelling.

[7]  Sen-Lin Tang,et al.  Taxonomy based performance metrics for evaluating taxonomic assignment methods , 2019, BMC Bioinformatics.

[8]  Reza Ebrahimpour,et al.  LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information. , 2014, Genomics.

[9]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[10]  Hiroyuki Kurata,et al.  A Comprehensive Review of In silico Analysis for Protein S-sulfenylation Sites. , 2018, Protein and peptide letters.

[11]  Sylvie Garneau-Tsodikova,et al.  Protein posttranslational modifications: the chemistry of proteome diversifications. , 2005, Angewandte Chemie.

[12]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[13]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[14]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[15]  Patrick J Stover,et al.  The Roles of SUMO in Metabolic Regulation. , 2017, Advances in experimental medicine and biology.

[16]  Swakkhar Shatabda,et al.  Enhanced Prediction of Lysine Propionylation Sites using Bi-peptide Evolutionary Features Resolving Data Imbalance , 2020, 2020 IEEE Region 10 Symposium (TENSYMP).

[17]  Hiroto Saigo,et al.  DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins. , 2020, Molecular omics.

[18]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.

[19]  Tao Huang,et al.  Identifying the Characteristics of the Hypusination Sites Using SMOTE and SVM Algorithm with Feature Selection , 2017 .

[20]  Abdollah Dehzangi,et al.  Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features , 2020, IEEE Access.

[21]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[22]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Yingming Zhao,et al.  Metabolic Regulation by Lysine Malonylation, Succinylation, and Glutarylation* , 2015, Molecular & Cellular Proteomics.

[24]  Longxiang Xie,et al.  Proteome-wide Lysine Glutarylation Profiling of the Mycobacterium tuberculosis H37Rv. , 2016, Journal of proteome research.

[25]  Yasen Jiao,et al.  Performance measures in evaluating machine learning based bioinformatics predictors for classifications , 2016, Quantitative Biology.

[26]  Jian-Jun He,et al.  Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection. , 2018, Analytical biochemistry.

[27]  Zhihong Zhang,et al.  Identification of lysine succinylation as a new post-translational modification. , 2011, Nature chemical biology.

[28]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[29]  Hiroto Saigo,et al.  RF-GlutarySite: a random forest based predictor for glutarylation sites. , 2019, Molecular omics.

[30]  Tatsuhiko Tsunoda,et al.  DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture , 2019, Scientific Reports.

[31]  Geoffrey I. Webb,et al.  Large-scale comparative assessment of computational predictors for lysine post-translational modification sites , 2018, Briefings Bioinform..

[32]  T. Tsunoda,et al.  PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. , 2017, Journal of theoretical biology.

[33]  Abdollah Dehzangi,et al.  iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features. , 2017, Journal of theoretical biology.

[34]  Anthony Kusalik,et al.  DAPPLE 2: a Tool for the Homology-Based Prediction of Post-Translational Modification Sites. , 2016, Journal of proteome research.

[35]  Kuo-Chen Chou,et al.  iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets , 2016, Molecules.

[36]  Zhen Chen,et al.  Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites , 2018, Genom. Proteom. Bioinform..

[37]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[38]  N Sarkar,et al.  The methylation of lysine residues in protein. , 1966, The Journal of biological chemistry.

[39]  Hyungwon Choi,et al.  PTMscape: an open source tool to predict generic post-translational modifications and map modification crosstalk in protein domains and biological processes† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c8mo00027a , 2018, Molecular omics.

[40]  T. Tsunoda,et al.  Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction , 2018, BMC Genomics.

[41]  HuangYing,et al.  CD-HIT Suite , 2010 .

[42]  Abdollah Dehzangi,et al.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features , 2017, Scientific Reports.

[43]  Yan Xu,et al.  iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features , 2018, IEEE Transactions on NanoBioscience.

[44]  Sumaiya Iqbal,et al.  PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence , 2018, Bioinform..

[45]  Cyrus Martin,et al.  The diverse functions of histone lysine methylation , 2005, Nature Reviews Molecular Cell Biology.

[46]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[47]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[48]  Anthony J. Kusalik,et al.  DAPPLE: a pipeline for the homology-based prediction of phosphorylation sites , 2013, Bioinform..

[49]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[50]  J. Boeke,et al.  Lysine Succinylation and Lysine Malonylation in Histones* , 2012, Molecular & Cellular Proteomics.

[51]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[52]  Kaushik Roy,et al.  RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites , 2020, Computational and structural biotechnology journal.

[53]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[54]  Ling-Yun Wu,et al.  iSulf-Cys: Prediction of S-sulfenylation Sites in Proteins with Physicochemical Properties of Amino Acids , 2016, PloS one.

[55]  Geoffrey I. Webb,et al.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework , 2018, Briefings Bioinform..

[56]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[57]  Tzong-Yi Lee,et al.  Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites , 2019, BMC Bioinformatics.

[58]  Abdollah Dehzangi,et al.  Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams , 2018, PloS one.