Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

Abstract Background Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). Results In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. Conclusions Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70 % to 95 % of total phosphoproteins, but only 40 % to 60 % of total p-sites.

[1]  A. Heck,et al.  Opposite Electron-Transfer Dissociation and Higher-Energy Collisional Dissociation Fragmentation Characteristics of Proteolytic K/R(X)n and (X)nK/R Peptides Provide Benefits for Peptide Sequencing in Proteomics and Phosphoproteomics. , 2017, Journal of proteome research.

[2]  Uwe Sauer,et al.  Untargeted metabolomics unravels functionalities of phosphorylation sites in Saccharomyces cerevisiae , 2016, BMC Systems Biology.

[3]  A. Heck,et al.  Six alternative proteases for mass spectrometry–based proteomics beyond trypsin , 2016, Nature Protocols.

[4]  Yves Van de Peer,et al.  The Challenges of Interpreting Phosphoproteomics Data: A Critical View Through the Bioinformatics Lens , 2015, CIBB.

[5]  Andrew R. Jones,et al.  Computational phosphoproteomics: From identification to localization , 2015, Proteomics.

[6]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[7]  Alessandro Vullo,et al.  Ensembl 2015 , 2014, Nucleic Acids Res..

[8]  M. Mann,et al.  Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. , 2014, Cell reports.

[9]  A. Panchenko,et al.  Physicochemical mechanisms of protein regulation by phosphorylation , 2014, Front. Genet..

[10]  Alan M. Moses,et al.  Turnover of protein phosphorylation evolving under stabilizing selection , 2014, Front. Genet..

[11]  Edith D. Wong,et al.  Saccharomyces genome database provides new regulation data , 2013, Nucleic Acids Res..

[12]  Kara Dolinski,et al.  The PhosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update , 2013, Database J. Biol. Databases Curation.

[13]  U. Sauer,et al.  Regulation of yeast central metabolism by enzyme phosphorylation , 2012, Molecular systems biology.

[14]  Matthias Mann,et al.  Consecutive proteolytic digestion in an enzyme reactor increases depth of proteomic and phosphoproteomic analysis. , 2012, Analytical chemistry.

[15]  Uwe Sauer,et al.  The importance of post-translational modifications in regulating Saccharomyces cerevisiae metabolism. , 2012, FEMS yeast research.

[16]  Kathryn S. Lilley,et al.  Evaluation and Properties of the Budding Yeast Phosphoproteome , 2012, Molecular & Cellular Proteomics.

[17]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[18]  J. Fíla,et al.  Enrichment techniques employed in phosphoproteomics , 2011, Amino Acids.

[19]  Jos Boekhorst,et al.  Evaluating Experimental Bias and Completeness in Comparative Phosphoproteomics Analysis , 2011, PloS one.

[20]  Edward L Huttlin,et al.  Correct Interpretation of Comprehensive Phosphorylation Dynamics Requires Normalization by Protein Expression Changes* , 2011, Molecular & Cellular Proteomics.

[21]  J. Coon,et al.  Value of using multiple proteases for large-scale mass spectrometry-based proteomics. , 2010, Journal of proteome research.

[22]  Mike Tyers,et al.  PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae , 2010, Database J. Biol. Databases Curation.

[23]  Stephen G Oliver,et al.  Posttranslational regulation impacts the fate of duplicated genes , 2010, Proceedings of the National Academy of Sciences.

[24]  Michal Linial,et al.  Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data , 2010, Biology Direct.

[25]  S. Lemeer,et al.  The phosphoproteomics data explosion. , 2009, Current opinion in chemical biology.

[26]  B. Snel,et al.  In-depth Qualitative and Quantitative Profiling of Tyrosine Phosphorylation Using a Combination of Phosphopeptide Immunoaffinity Purification and Stable Isotope Dimethyl Labeling* , 2009, Molecular & Cellular Proteomics.

[27]  Jeroen Krijgsveld,et al.  Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach. , 2009, Analytical chemistry.

[28]  W. Lim,et al.  Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species , 2009, PLoS biology.

[29]  C. Landry,et al.  Weak functional constraints on phosphoproteomes. , 2009, Trends in genetics : TIG.

[30]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[31]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[32]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[33]  G. Lienhard,et al.  Non-functional phosphorylations? , 2008, Trends in biochemical sciences.

[34]  Samuel H. Payne,et al.  A Multidimensional Chromatography Technology for In-depth Phosphoproteome Analysis*S , 2008, Molecular & Cellular Proteomics.

[35]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[36]  J. Ferrell,et al.  Mechanisms of specificity in protein phosphorylation , 2007, Nature Reviews Molecular Cell Biology.

[37]  L. Rivest,et al.  Rcapture: Loglinear Models for Capture-Recapture in R , 2007 .

[38]  Ruedi Aebersold,et al.  Reproducible isolation of distinct, overlapping segments of the phosphoproteome , 2007, Nature Methods.

[39]  Richard Durbin,et al.  Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase , 2007, Genome Biology.

[40]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[41]  J. Koziol,et al.  Application of capture-recapture models to estimation of protein count in MudPIT experiments. , 2006, Analytical chemistry.

[42]  W. Lehmann,et al.  Protein and proteome phosphorylation stoichiometry analysis by element mass spectrometry. , 2006, Analytical chemistry.

[43]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[44]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[45]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[46]  William S Hancock,et al.  Multiple enzymatic digestion for enhanced sequence coverage of proteins in complex proteomic mixtures using capillary LC with ion trap MS/MS. , 2003, Journal of proteome research.

[47]  P. Cohen,et al.  The origins of protein phosphorylation , 2002, Nature Cell Biology.

[48]  P. Cohen,et al.  The regulation of protein function by multisite phosphorylation--a 25 year update. , 2000, Trends in biochemical sciences.

[49]  L. Pinna,et al.  How do protein kinases recognize their substrates? , 1996, Biochimica et biophysica acta.

[50]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[51]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[52]  H. Akaike A new look at the statistical model identification , 1974 .