Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites

As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.

[1]  Matthew J. Rardin,et al.  SIRT5 Regulates both Cytosolic and Mitochondrial Protein Malonylation with Glycolysis as a Major Target. , 2015, Molecular cell.

[2]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  Ronald J A Wanders,et al.  Proteomic and Biochemical Studies of Lysine Malonylation Suggest Its Malonic Aciduria-associated Regulatory Role in Mitochondrial Function and Fatty Acid Oxidation* , 2015, Molecular & Cellular Proteomics.

[5]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[6]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[7]  Yi Zhang,et al.  The First Identification of Lysine Malonylation Substrates and Its Regulatory Enzyme* , 2011, Molecular & Cellular Proteomics.

[8]  Jiangning Song,et al.  Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features , 2015, Briefings Bioinform..

[9]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[10]  T. Hunter,et al.  Oncogenic kinase signalling , 2001, Nature.

[11]  Anna Goldenberg,et al.  TensorFlow: Biology's Gateway to Deep Learning? , 2016, Cell systems.

[12]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[13]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[14]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[15]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[16]  Li-na Wang,et al.  Computational prediction of species‐specific malonylation sites via enhanced characteristic strategy , 2016, Bioinform..

[17]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[18]  Yingming Zhao,et al.  Metabolic Regulation by Lysine Malonylation, Succinylation, and Glutarylation* , 2015, Molecular & Cellular Proteomics.

[19]  Yu Xue,et al.  DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning , 2018, Genom. Proteom. Bioinform..

[20]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[21]  Ao Li,et al.  Prediction of post-translational modification sites using multiple kernel support vector machine , 2017, PeerJ.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  J. Boeke,et al.  Lysine Succinylation and Lysine Malonylation in Histones* , 2012, Molecular & Cellular Proteomics.

[24]  Xiaoyi Xu,et al.  A novel method for predicting post-translational modifications on serine and threonine sites by using site-modification network profiles. , 2015, Molecular bioSystems.

[25]  Yi Shen,et al.  Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest , 2014, Amino Acids.

[26]  Ling-Yun Wu,et al.  Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection , 2016, Scientific Reports.

[27]  Feng Liu,et al.  Deep Learning and Its Applications in Biomedicine , 2018, Genom. Proteom. Bioinform..

[28]  Jianyang Zeng,et al.  A deep learning framework for modeling structural features of RNA-binding protein targets , 2015, Nucleic acids research.

[29]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[30]  Dong Xu,et al.  A multimodal deep architecture for large-scale protein ubiquitylation site prediction , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[31]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[32]  J. Downward The ins and outs of signalling , 2001, Nature.

[33]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[34]  Xiang David Li,et al.  A chemical probe for lysine malonylation. , 2013, Angewandte Chemie.

[35]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[36]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[37]  Y. Li,et al.  Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features. , 2016, Journal of proteome research.

[38]  Yu Liu,et al.  PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile , 2018, International journal of biological sciences.

[39]  Zhen Chen,et al.  SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties , 2012, PloS one.

[40]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[41]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[42]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.