A deep learning method to more accurately recall known lysine acetylation sites

BackgroundLysine acetylation in protein is one of the most important post-translational modifications (PTMs). It plays an important role in essential biological processes and is related to various diseases. To obtain a comprehensive understanding of regulatory mechanism of lysine acetylation, the key is to identify lysine acetylation sites. Previously, several shallow machine learning algorithms had been applied to predict lysine modification sites in proteins. However, shallow machine learning has some disadvantages. For instance, it is not as effective as deep learning for processing big data.ResultsIn this work, a novel predictor named DeepAcet was developed to predict acetylation sites. Six encoding schemes were adopted, including a one-hot, BLOSUM62 matrix, a composition of K-space amino acid pairs, information gain, physicochemical properties, and a position specific scoring matrix to represent the modified residues. A multilayer perceptron (MLP) was utilized to construct a model to predict lysine acetylation sites in proteins with many different features. We also integrated all features and implemented the feature selection method to select a feature set that contained 2199 features. As a result, the best prediction achieved 84.95% accuracy, 83.45% specificity, 86.44% sensitivity, 0.8540 AUC, and 0.6993 MCC in a 10-fold cross-validation. For an independent test set, the prediction achieved 84.87% accuracy, 83.46% specificity, 86.28% sensitivity, 0.8407 AUC, and 0.6977 MCC.ConclusionThe predictive performance of our DeepAcet is better than that of other existing methods. DeepAcet can be freely downloaded from https://github.com/Sunmile/DeepAcet.

[1]  Wei Gu,et al.  Activation of p53 Sequence-Specific DNA Binding by Acetylation of the p53 C-Terminal Domain , 1997, Cell.

[2]  K. Bechtol,et al.  Chunaram Choudhary Major Cellular Functions Lysine Acetylation Targets Protein Complexes and Co-Regulates , 2012 .

[3]  Jijun Tang,et al.  Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features. , 2017, Molecular bioSystems.

[4]  Hui Li,et al.  The Improved Training Algorithm of Back Propagation Neural Network with Self-adaptive Learning Rate , 2009, 2009 International Conference on Computational Intelligence and Natural Computing.

[5]  Yu Xue,et al.  GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences , 2016, Scientific Reports.

[6]  Yuanlie Lin,et al.  A Novel Method for N-terminal Acetylation Prediction , 2004, Genomics, proteomics & bioinformatics.

[7]  Yixue Li,et al.  Regulation of Cellular Metabolism by Protein Lysine Acetylation , 2010, Science.

[8]  Stephanie Spange,et al.  Acetylation of non-histone proteins modulates cellular signalling at multiple levels. , 2009, The international journal of biochemistry & cell biology.

[9]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[10]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[11]  Yu Xue,et al.  CPLM: a database of protein lysine modifications , 2013, Nucleic Acids Res..

[12]  Tony Kouzarides,et al.  Acetylation of importin-α nuclear import factors by CBP/p300 , 2000, Current Biology.

[13]  Xiang-Jiao Yang The diverse superfamily of lysine acetyltransferases and their roles in leukemia and other diseases. , 2004, Nucleic acids research.

[14]  Weidong Zhou,et al.  Mass spectrometry analysis of the post-translational modifications of alpha-enolase from pancreatic ductal adenocarcinoma cells. , 2010, Journal of proteome research.

[15]  Jeffrey N. Savas,et al.  Acetylation Targets Mutant Huntingtin to Autophagosomes for Degradation , 2009, Cell.

[16]  Yukun Cui,et al.  Phosphorylation of Estrogen Receptor α Blocks Its Acetylation and Regulates Estrogen Sensitivity , 2004, Cancer Research.

[17]  Jeong-A Lee,et al.  AFP-LSE: Antifreeze Proteins Prediction Using Latent Space Encoding of Composition of k-Spaced Amino Acid Pairs , 2020, Scientific Reports.

[18]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[19]  Wade V Welshons,et al.  Phosphorylation of estrogen receptor alpha blocks its acetylation and regulates estrogen sensitivity. , 2004, Cancer research.

[20]  Hsien-Da Huang,et al.  N‐Ace: Using solvent accessibility and physicochemical properties to identify protein N‐acetylation sites , 2010, J. Comput. Chem..

[21]  D. Mottet,et al.  Histone deacetylases: target enzymes for cancer therapy , 2007, Clinical & Experimental Metastasis.

[22]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[23]  Tatsuhiko Tsunoda,et al.  A comparison of machine learning classifiers for dementia with Lewy bodies using miRNA expression data , 2019, BMC Medical Genomics.

[24]  Kouhei Tsumoto,et al.  System-Wide Analysis of Protein Acetylation and Ubiquitination Reveals a Diversified Regulation in Human Cancer Cells , 2020, Biomolecules.

[25]  Junfeng Gao,et al.  A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine , 2013, PloS one.

[26]  Changjiang Jin,et al.  Prediction of N e -acetylation on internal lysines implemented in Bayesian Discriminant Method , 2006 .

[27]  Shu-Yun Huang,et al.  Position-Specific Analysis and Prediction for Protein Lysine Acetylation Based on Multiple Features , 2012, PLoS ONE.

[28]  N. Deng,et al.  Prediction of sumoylation sites in proteins using linear discriminant analysis. , 2016, Gene.

[29]  Andrew J. Bannister,et al.  Regulation of gene expression by transcription factor acetylation , 2000, Cellular and Molecular Life Sciences CMLS.

[30]  Chris T. Harvey,et al.  HDAC4 Protein Regulates HIF1α Protein Lysine Acetylation and Cancer Cell Response to Hypoxia* , 2011, The Journal of Biological Chemistry.

[31]  Andrew J. Bannister,et al.  Acetylation of importin-alpha nuclear import factors by CBP/p300. , 2000, Current biology : CB.

[32]  Chunaram Choudhary,et al.  The growing landscape of lysine acetylation links metabolism and cell signalling , 2014, Nature Reviews Molecular Cell Biology.

[33]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[34]  Yu Xue,et al.  CPLA 1.0: an integrated database of protein lysine acetylation , 2010, Nucleic Acids Res..

[35]  Christoph Meinel,et al.  Deep Learning for Medical Image Analysis , 2018, Journal of Pathology Informatics.

[36]  Yuanda Lv,et al.  Proteome-wide lysine acetylation identification in developing rice (Oryza sativa) seeds and protein co-modification by acetylation, succinylation, ubiquitination, and phosphorylation. , 2018, Biochimica et biophysica acta. Proteins and proteomics.

[37]  M. dal Peraro,et al.  Protein post-translational modifications: In silico prediction tools and molecular modeling , 2017, Computational and structural biotechnology journal.

[38]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[39]  Liwen Liu,et al.  LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine , 2019, Current genomics.

[40]  Dong Xu,et al.  Systematic analysis of human lysine acetylation proteins and accurate prediction of human lysine acetylation through bi-relative adapted binomial score Bayes feature representation. , 2012, Molecular bioSystems.

[41]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[42]  DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction , 2020, BMC Bioinformatics.

[43]  Nikolaj Blom,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis NetAcet: prediction of N-terminal acetylation sites , 2004 .

[44]  Yinan Kong,et al.  Histopathological Breast Cancer Image Classification by Deep Neural Network Techniques Guided by Local Clustering , 2018, BioMed research international.

[45]  Alberto Testolin,et al.  Modeling language and cognition with deep unsupervised learning: a tutorial overview , 2013, Front. Psychol..

[46]  L. Puglielli,et al.  PCSK9 is required for the disposal of non‐acetylated intermediates of the nascent membrane protein BACE1 , 2008, EMBO reports.

[47]  S. Juo,et al.  OxLDL causes both epigenetic modification and signaling regulation on the microRNA-29b gene: novel mechanisms for cardiovascular diseases. , 2012, Journal of molecular and cellular cardiology.

[48]  Han-Pil Choi,et al.  Proteomic Analysis Reveals Differentially Regulated Protein Acetylation in Human Amyotrophic Lateral Sclerosis Spinal Cord , 2013, PloS one.

[49]  D. Fairlie,et al.  Lysine acetylation in obesity, diabetes and metabolic disease , 2012, Immunology and cell biology.

[50]  N. Grishin,et al.  Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. , 2006, Molecular cell.

[51]  Jingrong Chen,et al.  Cohesin Acetylation Promotes Sister Chromatid Cohesion Only in Association with the Replication Machinery* , 2012, The Journal of Biological Chemistry.