Prediction of protein N-formylation and comparison with N-acetylation based on a feature selection method

Post-translational modifications play important roles in cell activities ranging from gene regulation to cytoplasmic mechanisms. Unfortunately, experimental methods investigating protein post-translational modifications such as high-resolution mass spectrometry are time consuming, labor-intensive and expensive. Therefore, there is a need to develop computational methods to facilitate fast and efficient identification. In this study, we developed a method to predict N-formylated methionines based on the Dagging method. Various features were incorporated, including PSSM conservation scores, amino acid factors, secondary structures, solvent accessibilities and disorder scores. An optimal feature set was selected containing 28 features using the mRMR (Maximum Relevance Minimum Redundancy) method and the IFS (Incremental Feature Selection) method. The prediction model constructed based on these features achieved an accuracy of 0.9074 and a MCC value of 0.7478. Analysis of these optimal features was performed, and several important factors and important sites were revealed to play important roles in N-formylation formation. We also compared N-formylation with N-acetylation, another type of important N-terminal modification of methionines. A total of top 34 MaxRel (most relevant) features were selected to discriminate between the two types of modifications, which may be candidates for studying the different mechanisms between N-formylation and N-acetylation. The results from our study further the understanding of these two types of modifications and provide guidance for related validation experiments.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Yu-Dong Cai,et al.  Prediction and Analysis of Post-Translational Pyruvoyl Residue Modification Sites from Internal Serines in Proteins , 2013, PloS one.

[3]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[4]  K Watanabe,et al.  Mammalian Mitochondrial Methionyl-tRNA Transformylase from Bovine Liver , 1998, The Journal of Biological Chemistry.

[5]  Kuo-Chen Chou,et al.  Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches. , 2012, Journal of proteomics.

[6]  J. Bergès,et al.  Toward understanding the protein oxidation processes: •OH addition on tyrosine, phenylalanine, or methionine? , 2011 .

[7]  S. Berger,et al.  Histone acetyltransferase complexes. , 1999, Seminars in cell & developmental biology.

[8]  Yu-Dong Cai,et al.  Computational prediction and analysis of protein γ-carboxylation sites based on a random forest method. , 2012, Molecular bioSystems.

[9]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[10]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[11]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[12]  George M Whitesides,et al.  Lysine acetylation can generate highly charged enzymes with increased resistance toward irreversible inactivation , 2008, Protein science : a publication of the Protein Society.

[13]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Oliver Kerscher,et al.  SUMO junction—what's your function? , 2007, EMBO reports.

[15]  Shao-Ping Shi,et al.  A method to distinguish between lysine acetylation and lysine methylation from protein sequences. , 2012, Journal of theoretical biology.

[16]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[17]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[18]  M Kai,et al.  High-performance liquid chromatographic determination of leucine-enkephalin-like peptide in rat brain by pre-column fluorescence derivatization involving formylation followed by reaction with 1,2-diamino-4,5-dimethoxybenzene. , 1988, Journal of chromatography.

[19]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[20]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[21]  T. Tullius,et al.  DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Matthias Mann,et al.  Mass Spectrometric Mapping of Linker Histone H1 Variants Reveals Multiple Acetylations, Methylations, and Phosphorylation as Well as Differences between Cell Culture and Tissue*S , 2007, Molecular & Cellular Proteomics.

[23]  R A Bradshaw,et al.  Eukaryotic methionyl aminopeptidases: two classes of cobalt-dependent enzymes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[26]  Matthias Mann,et al.  Nε-Formylation of lysine is a widespread post-translational modification of nuclear proteins occurring at residues involved in regulation of chromatin function , 2007, Nucleic acids research.

[27]  P. Grant,et al.  A tale of histone modifications , 2001, Genome Biology.

[28]  R A Bradshaw,et al.  Specificity of cotranslational amino-terminal processing of proteins in yeast. , 1987, Biochemistry.

[29]  Brian D. Strahl,et al.  Methylation of histone H4 at arginine 3 occurs in vivo and is mediated by the nuclear receptor coactivator PRMT1 , 2001, Current Biology.

[30]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[31]  B R Das,et al.  Increased ADP-ribosylation of histones in oral cancer. , 1993, Cancer letters.

[32]  Tao Huang,et al.  Prediction of Pharmacological and Xenobiotic Responses to Drugs Based on Time Course Gene Expression Profiles , 2009, PloS one.

[33]  Peter Claus,et al.  High mobility group proteins cHMG 1a, cHMG 1b, and cHMGI are distinctly distributed in chromosomes and differentially expressed during ecdysone dependent cell differentiation , 1997, Chromosoma.

[34]  Sonia Longhi,et al.  A practical overview of protein disorder prediction methods , 2006, Proteins.

[35]  Alma L. Burlingame,et al.  Mapping Post-translational Modifications of the Histone Variant MacroH2A1 Using Tandem Mass Spectrometry*S , 2006, Molecular & Cellular Proteomics.

[36]  C. Allis,et al.  Linking the epigenetic ‘language’ of covalent histone modifications to cancer , 2004, British Journal of Cancer.

[37]  Scott A. Busby,et al.  Comprehensive Phosphoprotein Analysis of Linker Histone H1 from Tetrahymena thermophila*S , 2006, Molecular & Cellular Proteomics.

[38]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[39]  P. Dedon,et al.  N-formylation of lysine in histone proteins as a secondary modification arising from oxidative DNA damage , 2007, Proceedings of the National Academy of Sciences.

[40]  S. Lovell,et al.  Characterization of Protein-Protein Interaction Interfaces from a Single Species , 2011, PloS one.

[41]  Lei Chen,et al.  Discriminating between Lysine Sumoylation and Lysine Acetylation Using mRMR Feature Selection and Analysis , 2014, PloS one.

[42]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[43]  Yu-Dong Cai,et al.  Predicting N-terminal acetylation based on feature selection method. , 2008, Biochemical and biophysical research communications.

[44]  Robert E. Steiger THE FORMYLATION OF AMINO ACIDS , 1930 .

[45]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[46]  Florian Gnad,et al.  Predicting post-translational lysine acetylation using support vector machines , 2010, Bioinform..

[47]  Cyrus Martin,et al.  The diverse functions of histone lysine methylation , 2005, Nature Reviews Molecular Cell Biology.

[48]  Jonathan D. Hirst,et al.  Prediction of glycosylation sites using random forests , 2008, BMC Bioinformatics.

[49]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[50]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[51]  Manuel Liebeke,et al.  Role of N-terminal protein formylation in central metabolic processes in Staphylococcus aureus , 2013, BMC Microbiology.

[52]  L. R. Gurley,et al.  The metabolism of histone fractions. II. Conservation and turnover of histone fractions in mammalian cells. , 1969, Archives of biochemistry and biophysics.

[53]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[54]  Zhongyi Cheng,et al.  Bioinformatic Analysis and Post-Translational Modification Crosstalk Prediction of Lysine Acetylation , 2011, PloS one.

[55]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[56]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[57]  U. RajBhandary,et al.  Initiator transfer RNAs , 1994, Journal of bacteriology.