Identification of the sequence determinants of protein N-terminal acetylation through a decision tree approach

BackgroundN-terminal acetylation is one of the most common protein modifications in eukaryotes and occurs co-translationally when the N-terminus of the nascent polypeptide is still attached to the ribosome. This modification has been shown to be involved in a wide range of biological phenomena such as protein half-life regulation, protein-protein and protein-membrane interactions, and protein subcellular localization. Thus, accurately predicting which proteins receive an acetyl group based on their protein sequence is expected to facilitate the functional study of this modification. As the occurrence of N-terminal acetylation strongly depends on the context of protein sequences, attempts to understand the sequence determinants of N-terminal acetylation were conducted initially by simply examining the N-terminal sequences of many acetylated and unacetylated proteins and more recently by machine learning approaches. However, a complete understanding of the sequence determinants of this modification remains to be elucidated.ResultsWe obtained curated N-terminally acetylated and unacetylated sequences from the UniProt database and employed a decision tree algorithm to identify the sequence determinants of N-terminal acetylation for proteins whose initiator methionine (iMet) residues have been removed. The results suggested that the main determinants of N-terminal acetylation are contained within the first five residues following iMet and that the first and second positions are the most important discriminator for the occurrence of this phenomenon. The results also indicated the existence of position-specific preferred and inhibitory residues that determine the occurrence of N-terminal acetylation. The developed predictor software, termed NT-AcPredictor, accurately predicted the N-terminal acetylation, with an overall performance comparable or superior to those of preceding predictors incorporating machine learning algorithms.ConclusionOur machine learning approach based on a decision tree algorithm successfully provided several sequence determinants of N-terminal acetylation for proteins lacking iMet, some of which have not previously been described. Although these sequence determinants remain insufficient to comprehensively predict the occurrence of this modification, indicating that further work on this topic is still required, the developed predictor, NT-AcPredictor, can be used to predict N-terminal acetylation with an accuracy of more than 80%.

[1]  R. Evjenth,et al.  Proteomics analyses reveal the evolutionary conservation and divergence of N-terminal acetyltransferases from yeast and humans , 2009, Proceedings of the National Academy of Sciences.

[2]  T. Arnesen,et al.  First Things First: Vital Protein Marks by N-Terminal Acetyltransferases. , 2016, Trends in biochemical sciences.

[3]  R. Sternglanz,et al.  An Nα-Acetyltransferase Responsible for Acetylation of the N-terminal Residues of Histones H4 and H2A* , 2003, Journal of Biological Chemistry.

[4]  Kris Gevaert,et al.  An Organellar Na-Acetyltransferase, Naa60, Acetylates Cytosolic N Termini of Transmembrane Proteins and Maintains Golgi Integrity.pptx , 2015 .

[5]  G von Heijne,et al.  Structures of N-terminally acetylated proteins. , 1985, European journal of biochemistry.

[6]  Kris Gevaert,et al.  N-terminal acetylome analyses and functional insights of the N-terminal acetyltransferase NatB , 2012, Proceedings of the National Academy of Sciences.

[7]  F Sherman,et al.  The Action of N-terminal Acetyltransferases on Yeast Ribosomal Proteins* , 1999, The Journal of Biological Chemistry.

[8]  A. Shevchenko,et al.  Two‐dimensional gel protein database of Saccharomyces cerevisiae (update 1999) , 1999, Electrophoresis.

[9]  F. Sherman,et al.  N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. , 2003, Journal of molecular biology.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Bruce Futcher,et al.  Proteome studies of Saccharomyces cerevisiae: Identification and characterization of abundant proteins , 1997, Electrophoresis.

[12]  M. Perrot,et al.  Two‐dimensional gel protein database of Saccharomyces cerevisiae , 1996, Electrophoresis.

[13]  F Sherman,et al.  N(alpha)-acetylation and proteolytic activity of the yeast 20 S proteasome. , 2000, The Journal of biological chemistry.

[14]  F Sherman,et al.  Identification and specificities of N‐terminal acetyltransferases from Saccharomyces cerevisiae , 1999, The EMBO journal.

[15]  Yayoi Kimura,et al.  N α-Acetylation and Proteolytic Activity of the Yeast 20 S Proteasome* , 2000, The Journal of Biological Chemistry.

[16]  Ronen Marmorstein,et al.  Molecular Basis for Amino-Terminal Acetylation by the Heterodimeric NatA Complex , 2013, Nature Structural &Molecular Biology.

[17]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[18]  Nikolaj Blom,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis NetAcet: prediction of N-terminal acetylation sites , 2004 .

[19]  N. Blom,et al.  Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks , 1996, Protein science : a publication of the Protein Society.

[20]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[21]  F Sherman,et al.  The specificities of yeast methionine aminopeptidase and acetylation of amino-terminal methionine in vivo. Processing of altered iso-1-cytochromes c created by oligonucleotide transformation. , 1990, The Journal of biological chemistry.

[22]  M. Miyagi,et al.  NH2-terminal acetylation of ribosomal proteins of Saccharomyces cerevisiae. , 1992, The Journal of biological chemistry.

[23]  Anne-Lise Veuthey,et al.  Motifs tree: a new method for predicting post-translational modifications , 2014, Bioinform..