Predicting lysine‐malonylation sites of proteins using sequence and predicted structural features

Malonylation is a recently discovered post‐translational modification (PTM) in which a malonyl group attaches to a lysine (K) amino acid residue of a protein. In this work, a novel machine learning model, SPRINT‐Mal, is developed to predict malonylation sites by employing sequence and predicted structural features. Evolutionary information and physicochemical properties are found to be the two most discriminative features whereas a structural feature called half‐sphere exposure provides additional improvement to the prediction performance. SPRINT‐Mal trained on mouse data yields robust performance for 10‐fold cross validation and independent test set with Area Under the Curve (AUC) values of 0.74 and 0.76 and Matthews’ Correlation Coefficient (MCC) of 0.213 and 0.20, respectively. Moreover, SPRINT‐Mal achieved comparable performance when testing on H. sapiens proteins without species‐specific training but not in bacterium S. erythraea. This suggests similar underlying physicochemical mechanisms between mouse and human but not between mouse and bacterium. SPRINT‐Mal is freely available as an online server at: http://sparks-lab.org/server/SPRINT-Mal/. © 2018 Wiley Periodicals, Inc.

[1]  J. Boeke,et al.  Lysine Succinylation and Lysine Malonylation in Histones* , 2012, Molecular & Cellular Proteomics.

[2]  Kaiyan Feng,et al.  Prediction of Lysine Malonylation Sites Based on Pseudo Amino Acid. , 2017, Combinatorial chemistry & high throughput screening.

[3]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[4]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[5]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[6]  Mahesh Kulharia,et al.  InCa-SiteFinder: a method for structure-based prediction of inositol and carbohydrate binding sites on proteins. , 2009, Journal of molecular graphics & modelling.

[7]  Zhen Xu,et al.  Lysine Malonylome May Affect the Central Metabolism and Erythromycin Biosynthesis Pathway in Saccharopolyspora erythraea. , 2016, Journal of proteome research.

[8]  Ling-Yun Wu,et al.  iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity , 2015, Scientific Reports.

[9]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[10]  C. Olsen,et al.  Expansion of the lysine acylation landscape. , 2012, Angewandte Chemie.

[11]  Gisbert Schneider,et al.  Support vector machine applications in bioinformatics. , 2003, Applied bioinformatics.

[12]  Peng Xue,et al.  Lysine Malonylation Is Elevated in Type 2 Diabetic Mouse Models and Enriched in Metabolic Associated Proteins* , 2014, Molecular & Cellular Proteomics.

[13]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[14]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[15]  Matthew J. Rardin,et al.  SIRT5 Regulates both Cytosolic and Mitochondrial Protein Malonylation with Glycolysis as a Major Target. , 2015, Molecular cell.

[16]  Alan Wee-Chung Liew,et al.  Structure‐based prediction of protein‐ peptide binding regions using Random Forest , 2018, Bioinform..

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[19]  Alan Wee-Chung Liew,et al.  Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. , 2016, Journal of chemical information and modeling.

[20]  Sangkyu Lee Post-Translational Modification of Proteins in Toxicological Research: Focus on Lysine Acylation , 2013, Toxicological research.

[21]  David Saggerson,et al.  Malonyl-CoA, a key signaling molecule in mammalian cells. , 2008, Annual review of nutrition.

[22]  Y. Li,et al.  Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features. , 2016, Journal of proteome research.

[23]  Shao-Ping Shi,et al.  SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy , 2015, Bioinform..

[24]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[25]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[26]  Zhiqiang Ma,et al.  Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique. , 2015, Journal of theoretical biology.

[27]  Li-na Wang,et al.  Computational prediction of species‐specific malonylation sites via enhanced characteristic strategy , 2016, Bioinform..

[28]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[29]  Jijun Tang,et al.  Predicting S-sulfenylation Sites Using Physicochemical Properties Differences , 2017 .

[30]  Ronald J A Wanders,et al.  Proteomic and Biochemical Studies of Lysine Malonylation Suggest Its Malonic Aciduria-associated Regulatory Role in Mitochondrial Function and Fatty Acid Oxidation* , 2015, Molecular & Cellular Proteomics.

[31]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[32]  Ling-Yun Wu,et al.  Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection , 2016, Scientific Reports.

[33]  Petety V Balaji,et al.  Identification of common structural features of binding sites in galactose‐specific proteins , 2004, Proteins.

[34]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[35]  Md. Nurul Haque Mollah,et al.  SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. , 2016, Molecular bioSystems.

[36]  Yingming Zhao,et al.  Metabolic Regulation by Lysine Malonylation, Succinylation, and Glutarylation* , 2015, Molecular & Cellular Proteomics.

[37]  T. Tsunoda,et al.  PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. , 2017, Journal of theoretical biology.

[38]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[39]  Yi Zhang,et al.  The First Identification of Lysine Malonylation Substrates and Its Regulatory Enzyme* , 2011, Molecular & Cellular Proteomics.

[40]  Xiang David Li,et al.  A chemical probe for lysine malonylation. , 2013, Angewandte Chemie.

[41]  Robert B. Russell,et al.  PepSite: prediction of peptide-binding sites from protein surfaces , 2012, Nucleic Acids Res..

[42]  Chunaram Choudhary,et al.  The growing landscape of lysine acetylation links metabolism and cell signalling , 2014, Nature Reviews Molecular Cell Biology.

[43]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[44]  Yue-Shi Lee,et al.  Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset , 2006 .

[45]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[46]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.