An Information Entropy-Based Approach for Computationally Identifying Histone Lysine Butyrylation

Butyrylation plays a crucial role in the cellular processes. Due to limit of techniques, it is a challenging task to identify histone butyrylation sites on a large scale. To fill the gap, we propose an approach based on information entropy and machine learning for computationally identifying histone butyrylation sites. The proposed method achieves 0.92 of area under the receiver operating characteristic (ROC) curve over the training set by 3-fold cross validation and 0.80 over the testing set by independent test. Feature analysis implies that amino acid residues in the down/upstream of butyrylation sites would exhibit specific sequence motif to a certain extent. Functional analysis suggests that histone butyrylation was most possibly associated with four pathways (systemic lupus erythematosus, alcoholism, viral carcinogenesis and transcriptional misregulation in cancer), was involved in binding with other molecules, processes of biosynthesis, assembly, arrangement or disassembly and was located in such complex as consists of DNA, RNA, protein, etc. The proposed method is useful to predict histone butyrylation sites. Analysis of feature and function improves understanding of histone butyrylation and increases knowledge of functions of butyrylated histones.

[1]  Wenyi Zhang,et al.  Prediction of methylation sites using the composition of K-spaced amino acid pairs. , 2013, Protein and peptide letters.

[2]  Zexian Liu,et al.  GPS-YNO2: computational prediction of tyrosine nitration sites in proteins. , 2011, Molecular bioSystems.

[3]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[4]  HuangYing,et al.  CD-HIT Suite , 2010 .

[5]  Yasubumi Sakakibara,et al.  Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization , 2009 .

[6]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[7]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[8]  Yuchao Zhang,et al.  Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm , 2014, BioMed research international.

[9]  Jiang Zhu,et al.  Computational prediction of N-linked glycosylation incorporating structural properties and patterns , 2012, Bioinform..

[10]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[11]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[12]  Rong Zeng,et al.  Predicting O-glycosylation sites in mammalian proteins by using SVMs , 2006, Comput. Biol. Chem..

[13]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[14]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jonathan D. Hirst,et al.  Prediction of glycosylation sites using random forests , 2008, BMC Bioinformatics.

[16]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[17]  Shu-Yun Huang,et al.  PMeS: Prediction of Methylation Sites Based on Enhanced Feature Encoding Scheme , 2012, PloS one.

[18]  B. Ye,et al.  Protein Acetylation and Butyrylation Regulate the Phenotype and Metabolic Shifts of the Endospore-forming Clostridium acetobutylicum* , 2018, Molecular & Cellular Proteomics.

[19]  Ling-Yun Wu,et al.  Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. , 2009, Protein engineering, design & selection : PEDS.

[20]  Yu Xue,et al.  CSS-Palm: palmitoylation site prediction with a clustering and scoring strategy (CSS) , 2006, Bioinform..

[21]  Yu-Dong Cai,et al.  Computational prediction and analysis of protein γ-carboxylation sites based on a random forest method. , 2012, Molecular bioSystems.

[22]  Jorng-Tzong Horng,et al.  Incorporating structural characteristics for identification of protein methylation sites , 2009, J. Comput. Chem..

[23]  Yan Xu,et al.  Prediction of protein methylation sites using conditional random field. , 2012, Protein and peptide letters.

[24]  Minghao Yin,et al.  Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features , 2013, BioMed research international.

[25]  S. Hake,et al.  Protein Acetylation , 2020, Methods in Molecular Biology.

[26]  Yu Xue,et al.  MeMo: a web tool for prediction of protein methylation modifications , 2006, Nucleic Acids Res..

[27]  Ling-Yun Wu,et al.  iSulf-Cys: Prediction of S-sulfenylation Sites in Proteins with Physicochemical Properties of Amino Acids , 2016, PloS one.

[28]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[29]  Jun Ding,et al.  Lysine acetylation sites prediction using an ensemble of support vector machine classifiers. , 2010, Journal of theoretical biology.

[30]  Ziding Zhang,et al.  Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs , 2008, BMC Bioinformatics.

[31]  Xiangyin Kong,et al.  Prediction of protein N-formylation and comparison with N-acetylation based on a feature selection method , 2016, Neurocomputing.

[32]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[33]  R. Roeder,et al.  Dynamic Competing Histone H4 K5K8 Acetylation and Butyrylation Are Hallmarks of Highly Active Gene Promoters , 2016, Molecular cell.

[34]  Nikolaj Blom,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis NetAcet: prediction of N-terminal acetylation sites , 2004 .

[35]  Yu Zhao,et al.  Dynamics and functional interplay of histone lysine butyrylation, crotonylation, and acetylation in rice under starvation and submergence , 2018, Genome Biology.

[36]  Zhixiang Wu,et al.  SAHA regulates histone acetylation, Butyrylation, and protein expression in neuroblastoma. , 2014, Journal of proteome research.

[37]  Tong Wang,et al.  A Novel Method , 2020, ArXiv.

[38]  Vasant Honavar,et al.  Glycosylation site prediction using ensembles of Support Vector Machine classifiers , 2007, BMC Bioinformatics.

[39]  Yun He,et al.  A novel method for high accuracy sumoylation site prediction from protein sequences , 2008, BMC Bioinformatics.

[40]  Kenneth M. Weiss IDENTIFICATION AND VERIFICATION , 2017 .

[41]  Yu Xue,et al.  GPS: a novel group-based phosphorylation predicting and scoring method. , 2004, Biochemical and biophysical research communications.

[42]  Zhihong Zhang,et al.  Identification and verification of lysine propionylation and butyrylation in yeast core histones using PTMap software. , 2009, Journal of proteome research.

[43]  O. Lund,et al.  NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility , 1998, Glycoconjugate Journal.

[44]  Xiaowei Zhao,et al.  Prediction of Protein Phosphorylation Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2012, PloS one.

[45]  Yi Tang,et al.  Lysine Propionylation and Butyrylation Are Novel Post-translational Modifications in Histones*S , 2007, Molecular & Cellular Proteomics.

[46]  Shao-Ping Shi,et al.  SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy , 2015, Bioinform..

[47]  Shao-Ping Shi,et al.  PLMLA: prediction of lysine methylation and lysine acetylation by combining multiple features. , 2012, Molecular bioSystems.

[48]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[49]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[50]  Ling-Yun Wu,et al.  iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity , 2015, Scientific Reports.

[51]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[52]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[53]  Tao Huang,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2011, Amino Acids.

[54]  Yusuke Nakamura,et al.  Critical roles of non-histone protein lysine methylation in human tumorigenesis , 2015, Nature Reviews Cancer.

[55]  Yu Xue,et al.  NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm , 2006, BMC Bioinformatics.

[56]  Qin Yang,et al.  Identifying protein arginine methylation sites using global features of protein sequence coupled with support vector machine optimized by particle swarm optimization algorithm , 2015 .

[57]  O. Lund,et al.  Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase. , 1995, The Biochemical journal.

[58]  Florian Gnad,et al.  Predicting post-translational lysine acetylation using support vector machines , 2010, Bioinform..

[59]  Chunaram Choudhary,et al.  The growing landscape of lysine acetylation links metabolism and cell signalling , 2014, Nature Reviews Molecular Cell Biology.

[60]  Menglong Li,et al.  Position-specific prediction of methylation sites from sequence conservation based on information theory , 2015, Scientific Reports.

[61]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[62]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[63]  C. Tung Prediction of pupylation sites using the composition of k-spaced amino acid pairs. , 2013, Journal of theoretical biology.

[64]  Li-na Wang,et al.  Computational prediction of species‐specific malonylation sites via enhanced characteristic strategy , 2016, Bioinform..

[65]  Md. Nurul Haque Mollah,et al.  SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. , 2016, Molecular bioSystems.

[66]  Antoine M. van Oijen,et al.  Real-time single-molecule observation of rolling-circle DNA replication , 2009, Nucleic acids research.

[67]  Yu Xue,et al.  PLMD: An updated data resource of protein lysine modifications. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[68]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[69]  Eran Segal,et al.  Proteome-wide prediction of acetylation substrates , 2009, Proceedings of the National Academy of Sciences.