SEMal: Accurate protein malonylation site predictor using structural and evolutionary information

Post Transactional Modification (PTM) is a vital process which plays an important role in a wide range of biological interactions. One of the most recently identified PTMs is Malonylation. It has been shown that Malonylation has an important impact on different biological pathways including glucose and fatty acid metabolism. Malonylation can be detected experimentally using mass spectrometry. However, this process is both costly and time-consuming which has inspired research to find more efficient and fast computational methods to solve this problem. This paper proposes a novel approach, called SEMal, to identify Malonylation sites in protein sequences. It uses both structural and evolutionary-based features to solve this problem. It also uses Rotation Forest (RoF) as its classification technique to predict Malonylation sites. To the best of our knowledge, our extracted features as well as our employed classifier have never been used for this problem. Compared to the previously proposed methods, SEMal outperforms them in all metrics such as sensitivity (0.94 and 0.89), accuracy (0.94 and 0.91), and Matthews correlation coefficient (0.88 and 0.82), for Homo Sapiens and Mus Musculus species, respectively. SEMal is publicly available as an online predictor at: http://brl.uiu.ac.bd/SEMal/.

[1]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[2]  Li-na Wang,et al.  Computational prediction of species‐specific malonylation sites via enhanced characteristic strategy , 2016, Bioinform..

[3]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[4]  Kara Dolinski,et al.  BioGRID: A Resource for Studying Biological Interactions in Yeast. , 2016, Cold Spring Harbor protocols.

[5]  Abdollah Dehzangi,et al.  Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams , 2018, PloS one.

[6]  Ling-Yun Wu,et al.  Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection , 2016, Scientific Reports.

[7]  Geoffrey I. Webb,et al.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework , 2018, Briefings Bioinform..

[8]  Changjiang Jin,et al.  CSS-Palm 2.0: an updated software for palmitoylation sites prediction. , 2008, Protein engineering, design & selection : PEDS.

[9]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[10]  S. Brunak,et al.  Precision mapping of the human O‐GalNAc glycoproteome through SimpleCell technology , 2013, The EMBO journal.

[11]  Yan Xu,et al.  Prediction of posttranslational modification sites from amino acid sequences with kernel methods. , 2014, Journal of theoretical biology.

[12]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Y. Li,et al.  Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features. , 2016, Journal of proteome research.

[16]  Stefan Westermann,et al.  Post-translational modifications regulate microtubule function , 2003, Nature Reviews Molecular Cell Biology.

[17]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[18]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[19]  Kuldip K. Paliwal,et al.  Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features , 2015, BMC Bioinformatics.

[20]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[22]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[23]  A. Krogh,et al.  Prediction of lipoprotein signal peptides in Gram‐negative bacteria , 2003, Protein science : a publication of the Protein Society.

[24]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[25]  James G. Lyons,et al.  SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. , 2017, Methods in molecular biology.

[26]  D. Virshup,et al.  Post-translational modifications regulate the ticking of the circadian clock , 2007, Nature Reviews Molecular Cell Biology.

[27]  Zhen Chen,et al.  Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites , 2018, Genom. Proteom. Bioinform..

[28]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[29]  Kuldip K. Paliwal,et al.  Protein Fold Recognition Using an Overlapping Segmentation Approach and a Mixture of Feature Extraction Models , 2013, Australasian Conference on Artificial Intelligence.

[30]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[31]  T. Tsunoda,et al.  PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. , 2017, Journal of theoretical biology.

[32]  Alan Wee-Chung Liew,et al.  Predicting lysine‐malonylation sites of proteins using sequence and predicted structural features , 2018, J. Comput. Chem..

[33]  J. Boeke,et al.  Lysine Succinylation and Lysine Malonylation in Histones* , 2012, Molecular & Cellular Proteomics.

[34]  Yong Zhou,et al.  Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions , 2018, Scientific Reports.

[35]  Abdollah Dehzangi,et al.  iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features. , 2017, Journal of theoretical biology.

[36]  Zhu-Hong You,et al.  RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. , 2016, Current protein & peptide science.

[37]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[38]  Abdollah Dehzangi,et al.  Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features , 2020, IEEE Access.

[39]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[40]  Yu Xue,et al.  GPS 5.0: An Update on the Prediction of Kinase-specific Phosphorylation Sites in Proteins , 2020, Genom. Proteom. Bioinform..

[41]  K. Julenius NetCGlyc 1.0: prediction of mammalian C-mannosylation sites. , 2007, Glycobiology.

[42]  Abdollah Dehzangi,et al.  iProtGly‐SS: Identifying protein glycation sites using sequence and structure based features , 2018, Proteins.

[43]  Abdollah Dehzangi,et al.  GlyStruct: glycation prediction using structural properties of amino acid residues , 2019, BMC Bioinformatics.

[44]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[45]  L. Johnson The regulation of protein phosphorylation. , 2009, Biochemical Society transactions.

[46]  Abdollah Dehzangi,et al.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features , 2017, Scientific Reports.

[47]  Dah-Jye Lee,et al.  A Spine X-Ray Image Retrieval System Using Partial Shape Matching , 2008, IEEE Transactions on Information Technology in Biomedicine.

[48]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[49]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[50]  Alhadi Bustamam,et al.  Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences , 2019, BMC Genomics.

[51]  D. Fiedler,et al.  Features and regulation of non-enzymatic post-translational modifications. , 2018, Nature chemical biology.

[52]  Zhu-Hong You,et al.  An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers , 2017, Neurocomputing.