Enhanced Prediction of Lysine Propionylation Sites using Bi-peptide Evolutionary Features Resolving Data Imbalance

Lysine propionylation is an extensive post- translational modification in both eukaryotes and prokaryotes which brings a significant regulatory role. However, for various biological processes and metabolic activities, it is also responsible. Nevertheless, the limitation and effectiveness of lysine propionylation in photosynthetic organisms remain obscure. If any computational method can thoroughly identify the propionylation sites with greater accuracy then it will be more beneficial as well as save the money and cost. In this work, we have introduced a novel feature extraction method to predict the propionylation sites using the bi-peptide based evolutionary feature concept. Afterward, we have applied the Support Vector Machine (SVM) classifier that gave Matthew's correlation coefficient of 0.72, Sensitivity of 73.05 %, Specificity of 97.05 %, Accuracy of 85.05 %, and F1-score of 0.82 when using 10-fold cross-validation. Our proposed model outperforms the previously developed tool, PropPred, in all cases.

[1]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[2]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[3]  Ann R. Cannon Essential Statistics , 2001 .

[4]  Zhe Ju,et al.  Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC. , 2017, Journal of molecular graphics & modelling.

[5]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[6]  D. Petersen,et al.  Ethanol Metabolism Modifies Hepatic Protein Acylation in Mice , 2013, PloS one.

[7]  R. Masui,et al.  Lysine Propionylation Is a Prevalent Post-translational Modification in Thermus thermophilus , 2014, Molecular & Cellular Proteomics.

[8]  Stefan Westermann,et al.  Post-translational modifications regulate microtubule function , 2003, Nature Reviews Molecular Cell Biology.

[9]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[10]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[11]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[12]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[13]  Yi Tang,et al.  Lysine Propionylation and Butyrylation Are Novel Post-translational Modifications in Histones*S , 2007, Molecular & Cellular Proteomics.

[14]  Pav Kalinowski,et al.  Understanding Confidence Intervals (CIs) and Effect Size Estimation , 2010 .

[15]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[16]  Alan Wee-Chung Liew,et al.  Predicting lysine‐malonylation sites of proteins using sequence and predicted structural features , 2018, J. Comput. Chem..

[17]  Chunaram Choudhary,et al.  The growing landscape of lysine acetylation links metabolism and cell signalling , 2014, Nature Reviews Molecular Cell Biology.

[18]  S.M. Shovan,et al.  Prediction of Lysine Glycation PTM site in Protein using Peptide Sequence Evolution based Features , 2019, 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE).