DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction

Lysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of Kcrot sites have been experimentally verified on non-histone proteins from multiple species. Accordingly, a few predictors have been developed for predicting the Krot sites for specific organisms (i.e. humans and papaya). Nevertheless, there is a lack of research on the comparison of the crotonylomes of different organisms. Here, we collected around 20,000 Kcrot sites experimentally identified from four different species as the benchmark data set. We present the deep-learning (DL) architecture dubbed DeepKcrot for predicting Kcrot sites on the proteomes across various species. DeepKcrot includes species-specific and general classifiers using a convolutional neural network with the word embedding (CNNWE) encoding approach. CNNWE performs better than both the traditional ML-based and other DL-based classifiers in terms of ten-fold cross-validation and independent test, independent of the size of the training set. Additionally, cross-species performance for each species-specific predictor is not as good as the self-species performance whereas the cross-species performance generally increases with the size of the training dataset. Moreover, the predictors developed based on the non-histone Kcrot sites are unsuccessful for the histone Kcrot prediction, suggesting that the Kcrot-containing peptides from non-histone and histone proteins have significantly different characteristics and data integration is required. Overall, DeepKcrot is an efficient prediction tool and freely available at http://www.bioinfogo.org/deepkcrot.

[1]  Yingming Zhao,et al.  Histone crotonylation specifically marks the haploid male germ cell gene expression program , 2012, BioEssays : news and reviews in molecular, cellular and developmental biology.

[2]  Zhen Chen,et al.  Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites , 2018, Genom. Proteom. Bioinform..

[3]  Lin He,et al.  Global crotonylome reveals CDYL-regulated RPA1 crotonylation in homologous recombination–mediated DNA repair , 2020, Science Advances.

[4]  Weizhi Xu,et al.  Global profiling of crotonylation on non-histone proteins , 2017, Cell Research.

[5]  Zhike Lu,et al.  Identification of 67 Histone Marks and Histone Lysine Crotonylation as a New Type of Histone Modification , 2011, Cell.

[6]  Zhe Ju,et al.  Prediction of lysine crotonylation sites by incorporating the composition of k-spaced amino acid pairs into Chou's general PseAAC. , 2017, Journal of molecular graphics & modelling.

[7]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[8]  Jiangning Song,et al.  Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features , 2015, Briefings Bioinform..

[9]  Wenli Zhang,et al.  Global Involvement of Lysine Crotonylation in Protein Modification and Transcription Regulation in Rice , 2018, Molecular & Cellular Proteomics.

[10]  Hua Tang,et al.  Identify and analysis crotonylation sites in histone by using support vector machines , 2017, Artif. Intell. Medicine.

[11]  Lei Li,et al.  BERMP: a cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach , 2018, International journal of biological sciences.

[12]  Lin He,et al.  Chromodomain Protein CDYL Acts as a Crotonyl-CoA Hydratase to Regulate Histone Crotonylation and Spermatogenesis. , 2017, Molecular cell.

[13]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[14]  J. Wong,et al.  Large-Scale Identification of Protein Crotonylation Reveals Its Role in Multiple Cellular Functions. , 2017, Journal of proteome research.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Henrik Molina,et al.  Intracellular crotonyl-CoA stimulates transcription through p300-catalyzed histone crotonylation. , 2015, Molecular cell.

[17]  Yang Zou,et al.  DeepCSO: A Deep-Learning Network Approach to Predicting Cysteine S-Sulphenylation Sites , 2020, bioRxiv.

[18]  Yingming Zhao,et al.  Quantitative Crotonylome Analysis Expands the Roles of p300 in the Regulation of Lysine Crotonylation Pathway , 2018, Proteomics.

[19]  Zhiwei Wu,et al.  Ultradeep Lysine Crotonylome Reveals the Crotonylation Enhancement on Both Histones and Nonhistone Proteins by SAHA Treatment. , 2017, Journal of proteome research.

[20]  Zhuo Wang,et al.  Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian , 2020, Scientific Reports.

[21]  Q. Zou,et al.  Research progress in protein posttranslational modification site prediction. , 2018, Briefings in functional genomics.

[22]  Chenjia Shen,et al.  A qualitative proteome-wide lysine crotonylation profiling of papaya (Carica papaya L.) , 2018, Scientific Reports.

[23]  Tieliu Shi,et al.  MOF as an evolutionarily conserved histone crotonyltransferase and transcriptional activation by histone acetyltransferase-deficient and crotonyltransferase-competent CBP/p300 , 2017, Cell Discovery.

[24]  Hangjun Sun,et al.  First comprehensive proteome analysis of lysine crotonylation in seedling leaves of Nicotiana tabacum , 2017, Scientific Reports.

[25]  Hao Lv,et al.  Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method , 2020, Briefings Bioinform..

[26]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[27]  Ningning He,et al.  Identification of Protein Lysine Crotonylation Sites by a Deep Learning Framework With Convolutional Neural Networks , 2020, IEEE Access.