A method to distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis

Lysine acetylation and ubiquitination are two primary post-translational modifications (PTMs) in most eukaryotic proteins. Lysine residues are targets for both types of PTMs, resulting in different cellular roles. With the increasing availability of protein sequences and PTM data, it is challenging to distinguish the two types of PTMs on lysine residues. Experimental approaches are often laborious and time consuming. There is an urgent need for computational tools to distinguish between lysine acetylation and ubiquitination. In this study, we developed a novel method, called DAUFSA (distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis), to discriminate ubiquitinated and acetylated lysine residues. The method incorporated several types of features: PSSM (position-specific scoring matrix) conservation scores, amino acid factors, secondary structures, solvent accessibilities, and disorder scores. By using the mRMR (maximum relevance minimum redundancy) method and the IFS (incremental feature selection) method, an optimal feature set containing 290 features was selected from all incorporated features. A dagging-based classifier constructed by the optimal features achieved a classification accuracy of 69.53%, with an MCC of .3853. An optimal feature set analysis showed that the PSSM conservation score features and the amino acid factor features were the most important attributes, suggesting differences between acetylation and ubiquitination. Our study results also supported previous findings that different motifs were employed by acetylation and ubiquitination. The feature differences between the two modifications revealed in this study are worthy of experimental validation and further investigation.

[1]  Xiang-Jiao Yang,et al.  Lysine acetylation and the bromodomain: a new partnership for signaling , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[2]  Ivan Dikic,et al.  Ubiquitylation and cell signaling , 2005, The EMBO journal.

[3]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[4]  R. Linhardt,et al.  Intramolecular disulfide bond between catalytic cysteines in an intein precursor. , 2012, Journal of the American Chemical Society.

[5]  P. A. Friedman,et al.  Minireview: ubiquitination-regulated G protein-coupled receptor signaling and trafficking. , 2013, Molecular endocrinology.

[6]  Changjiang Jin,et al.  Prediction of Nepsilon-acetylation on internal lysines implemented in Bayesian Discriminant Method. , 2006, Biochemical and biophysical research communications.

[7]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[8]  I. Talianidis,et al.  Cross-talk between post-translational modifications regulates life or death decisions by E2F1 , 2010, Cell cycle.

[9]  Yu-Dong Cai,et al.  Computational prediction and analysis of protein γ-carboxylation sites based on a random forest method. , 2012, Molecular bioSystems.

[10]  J. Ausió,et al.  Acetylation Increases the α-Helical Content of the Histone Tails of the Nucleosome* , 2000, The Journal of Biological Chemistry.

[11]  Peer Bork,et al.  Deciphering a global network of functionally associated post-translational modifications , 2012, Molecular systems biology.

[12]  Yu-Dong Cai,et al.  Prediction and Analysis of Post-Translational Pyruvoyl Residue Modification Sites from Internal Serines in Proteins , 2013, PloS one.

[13]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[14]  Jun Ding,et al.  Lysine acetylation sites prediction using an ensemble of support vector machine classifiers. , 2010, Journal of theoretical biology.

[15]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[16]  S. Moss,et al.  Acetylation of Drosha on the N-Terminus Inhibits Its Degradation by Ubiquitination , 2013, PloS one.

[17]  Da-Peng Li,et al.  Amino Acid Principal Component Analysis (AAPCA) and its Applications in Protein Structural Class Prediction , 2006, Journal of biomolecular structure & dynamics.

[18]  Yu-Dong Cai,et al.  Prediction of carbamylated lysine sites based on the one-class k-nearest neighbor method. , 2013, Molecular bioSystems.

[19]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[20]  Steven P. Gygi,et al.  Weighing in on ubiquitin: the expanding role of mass-spectrometry-based proteomics , 2005, Nature Cell Biology.

[21]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[22]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[23]  P. Ortiz de Montellano,et al.  Conserved Cysteine Residues Provide a Protein-Protein Interaction Surface in Dual Oxidase (DUOX) Proteins* , 2013, The Journal of Biological Chemistry.

[24]  A. Mirsky,et al.  ACETYLATION AND METHYLATION OF HISTONES AND THEIR POSSIBLE ROLE IN THE REGULATION OF RNA SYNTHESIS. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[26]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[27]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[28]  L. Jensen,et al.  Mass Spectrometric Analysis of Lysine Ubiquitylation Reveals Promiscuity at Site Level* , 2010, Molecular & Cellular Proteomics.

[29]  C. Sureau,et al.  A Tryptophan-Rich Motif in the Carboxyl Terminus of the Small Envelope Protein of Hepatitis B Virus Is Central to the Assembly of Hepatitis Delta Virus Particles , 2006, Journal of Virology.

[30]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[31]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[32]  Peter G. Smith,et al.  Role of ubiquitin ligases and the proteasome in oncogenesis: novel targets for anticancer therapies. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[33]  M. Vihinen How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.

[34]  E. Seto,et al.  Lysine acetylation: codified crosstalk with other posttranslational modifications. , 2008, Molecular cell.

[35]  E L Gershey,et al.  Chemical studies of histone acetylation. The distribution of epsilon-N-acetyllysine in calf thymus histones. , 1968, The Journal of biological chemistry.

[36]  P. Grant,et al.  A tale of histone modifications , 2001, Genome Biology.

[37]  Tao Huang,et al.  Prediction of Pharmacological and Xenobiotic Responses to Drugs Based on Time Course Gene Expression Profiles , 2009, PloS one.

[38]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[39]  Wei Xiao,et al.  The p38-interacting Protein (p38IP) Regulates G2/M Progression by Promoting α-Tubulin Acetylation via Inhibiting Ubiquitination-induced Degradation of the Acetyltransferase GCN5* , 2013, The Journal of Biological Chemistry.

[40]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[41]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[42]  S. Minucci,et al.  Acetylation: a novel link between double-strand break repair and autophagy. , 2012, Cancer research.

[43]  M. Mann,et al.  Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions , 2009, Science.

[44]  K. Chou,et al.  Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites , 2010, Journal of biomolecular structure & dynamics.

[45]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[46]  Claus Lundegaard,et al.  NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features , 2010, PloS one.

[47]  Tao Huang,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2011, Amino Acids.

[48]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[49]  Changjiang Jin,et al.  Prediction of N e -acetylation on internal lysines implemented in Bayesian Discriminant Method , 2006 .

[50]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[51]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[52]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Yu-Dong Cai,et al.  Predicting N-terminal acetylation based on feature selection method. , 2008, Biochemical and biophysical research communications.

[54]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[55]  T. Arnesen,et al.  Protein alpha‐N‐acetylation studied by N‐terminomics , 2011, The FEBS journal.

[56]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[57]  C. Worby,et al.  Insights into Lafora disease: malin is an E3 ubiquitin ligase that ubiquitinates and promotes the degradation of laforin. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Hsien-Da Huang,et al.  N‐Ace: Using solvent accessibility and physicochemical properties to identify protein N‐acetylation sites , 2010, J. Comput. Chem..

[59]  J. Bergès,et al.  Toward understanding the protein oxidation processes: •OH addition on tyrosine, phenylalanine, or methionine? , 2011 .

[60]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[61]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[62]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Peer Bork,et al.  PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins , 2012, Nucleic Acids Res..

[64]  Mark Gerstein,et al.  Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. , 2008, Genome research.

[65]  Shao-Ping Shi,et al.  A method to distinguish between lysine acetylation and lysine methylation from protein sequences. , 2012, Journal of theoretical biology.

[66]  Y. Xiong,et al.  Mechanistic insights into the regulation of metabolic enzymes by acetylation , 2012, The Journal of cell biology.

[67]  Eran Segal,et al.  Proteome-wide prediction of acetylation substrates , 2009, Proceedings of the National Academy of Sciences.

[68]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[69]  Florian Gnad,et al.  Predicting post-translational lysine acetylation using support vector machines , 2010, Bioinform..

[70]  Sonia Longhi,et al.  A practical overview of protein disorder prediction methods , 2006, Proteins.

[71]  C. Allis,et al.  Histone methylation versus histone acetylation: new insights into epigenetic regulation. , 2001, Current opinion in cell biology.

[72]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[73]  Lei Chen,et al.  Discriminating between Lysine Sumoylation and Lysine Acetylation Using mRMR Feature Selection and Analysis , 2014, PloS one.

[74]  R. Benezra An intermolecular disulfide bond stabilizes E2A homodimers and is required for DNA binding at physiological temperatures , 1994, Cell.

[75]  Benedikt M Kessler,et al.  Ubiquitin - omics reveals novel networks and associations with human disease. , 2013, Current opinion in chemical biology.