Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database

Post-translational modifications (PTMs) broadly contribute to the recent explosion of proteomic data and possess a complexity surpassing that of protein design. PTMs are the chemical modification of a protein after its translation, and have wide effects broadening its range of functionality. Based on previous estimates, it is widely believed that more than half of proteins are glycoproteins. Whereas mutations can only occur once per position, different forms of post-translational modifications may occur in tandem. With the number and abundances of modifications constantly being discovered, there is no method to readily assess their relative levels. Here we report the relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data, and show that at best, less than one-fifth of proteins are glycosylated. We make available to the academic community a continuously updated resource (http://selene.princeton.edu/PTMCuration) containing the statistics so scientists can assess “how many” of each PTM exists.

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[2]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[3]  M. Tyers,et al.  From genomics to proteomics , 2003, Nature.

[4]  Søren Brunak,et al.  O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins , 1999, Nucleic Acids Res..

[5]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[6]  Peter A. DiMaggio,et al.  The significance, development and progress of high-throughput combinatorial histone code analysis , 2010, Cellular and Molecular Life Sciences.

[7]  Griffin M. Weber,et al.  BioNumbers—the database of key numbers in molecular and cell biology , 2009, Nucleic Acids Res..

[8]  Nikolaj Blom,et al.  Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins , 2004, BMC Bioinformatics.

[9]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[10]  Amos Bairoch,et al.  Annotation of post‐translational modifications in the Swiss‐Prot knowledge base , 2004, Proteomics.

[11]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[12]  M. Mann,et al.  PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites , 2007, Genome Biology.

[13]  Kristen M. Naegle,et al.  PTMScout, a Web Resource for Analysis of High Throughput Post-translational Proteomics Studies* , 2010, Molecular & Cellular Proteomics.

[14]  R Apweiler,et al.  On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. , 1999, Biochimica et biophysica acta.

[15]  Alejandro Garcia,et al.  UbiProt: a database of ubiquitylated proteins , 2007, BMC Bioinformatics.

[16]  Christodoulos A Floudas,et al.  A Mixed Integer Linear Optimization Framework for the Identification and Quantification of Targeted Post-translational Modifications of Highly Modified Proteins Using Multiplexed Electron Transfer Dissociation Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[17]  Christodoulos A. Floudas,et al.  A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry* , 2010, Molecular & Cellular Proteomics.