The state of the human proteome in 2012 as viewed through PeptideAtlas.

The Human Proteome Project was launched in September 2010 with the goal of characterizing at least one protein product from each protein-coding gene. Here we assess how much of the proteome has been detected to date via tandem mass spectrometry by analyzing PeptideAtlas, a compendium of human derived LC-MS/MS proteomics data from many laboratories around the world. All data sets are processed with a consistent set of parameters using the Trans-Proteomic Pipeline and subjected to a 1% protein FDR filter before inclusion in PeptideAtlas. Therefore, PeptideAtlas contains only high confidence protein identifications. To increase proteome coverage, we explored new comprehensive public data sources for data likely to add new proteins to the Human PeptideAtlas. We then folded these data into a Human PeptideAtlas 2012 build and mapped it to Swiss-Prot, a protein sequence database curated to contain one entry per human protein coding gene. We find that this latest PeptideAtlas build includes at least one peptide for each of ~12500 Swiss-Prot entries, leaving ~7500 gene products yet to be confidently cataloged. We characterize these "PA-unseen" proteins in terms of tissue localization, transcript abundance, and Gene Ontology enrichment, and propose reasons for their absence from PeptideAtlas and strategies for detecting them in the future.

[1]  David Fenyö,et al.  Mass spectrometric protein identification using the global proteome machine. , 2010, Methods in molecular biology.

[2]  M. Mann,et al.  Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins* , 2012, Molecular & Cellular Proteomics.

[3]  Michael P Washburn,et al.  Advances in shotgun proteomics and the analysis of membrane proteomes. , 2010, Journal of proteomics.

[4]  Jayson A. Falkner,et al.  Tranche: decentralized data storage for the proteomics community , 2007 .

[5]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Heck,et al.  The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells , 2011, Molecular systems biology.

[7]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[8]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[9]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[10]  P. Cohen,et al.  The regulation of protein function by multisite phosphorylation--a 25 year update. , 2000, Trends in biochemical sciences.

[11]  Jon W. Huss,et al.  BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources , 2009, Genome Biology.

[12]  Martin Kircher,et al.  Deep proteome and transcriptome mapping of a human cancer cell line , 2011, Molecular systems biology.

[13]  Marc R Wilkins,et al.  The methylproteome and the intracellular methylation network , 2012, Proteomics.

[14]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[15]  M. Savitski,et al.  Unbiased detection of posttranslational modifications using mass spectrometry. , 2010, Methods in molecular biology.

[16]  Ruedi Aebersold,et al.  Targeted proteomic strategy for clinical biomarker discovery , 2009, Molecular oncology.

[17]  J. Ellenberg,et al.  The quantitative proteome of a human cell line , 2011, Molecular systems biology.

[18]  Juan Antonio Vizcaíno,et al.  Improvements in the protein identifier cross-reference service , 2012, Nucleic Acids Res..

[19]  R. Aebersold,et al.  Large-scale quantitative assessment of different in-solution protein digestion protocols reveals superior cleavage efficiency of tandem Lys-C/trypsin proteolysis over trypsin digestion. , 2012, Journal of proteome research.

[20]  Chenguang Wang,et al.  Acetylation in nuclear receptor signaling and the role of sirtuins. , 2008, Molecular endocrinology.

[21]  Daniel B. McClatchy,et al.  Strategies for shotgun identification of integral membrane proteins by tandem mass spectrometry , 2008, Proteomics.

[22]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[23]  Christine C. Wu,et al.  Proteomics of Integral Membrane Proteins — Theory and Application , 2007 .

[24]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[25]  Lennart Martens,et al.  PRIDE: new developments and new datasets , 2007, Nucleic Acids Res..

[26]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[27]  Kenia Whitehead,et al.  Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage. , 2008, Journal of proteome research.

[28]  R. Aebersold,et al.  A High-Confidence Human Plasma Proteome Reference Set with Estimated Concentrations in PeptideAtlas* , 2011, Molecular & Cellular Proteomics.

[29]  M. Mann,et al.  Quantitative Proteomics Reveals That Hsp90 Inhibition Preferentially Targets Kinases and the DNA Damage Response* , 2011, Molecular & Cellular Proteomics.

[30]  N. Anderson,et al.  The Human Plasma Proteome , 2002, Molecular & Cellular Proteomics.

[31]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[32]  Tony Kouzarides,et al.  Acetylation: a regulatory modification to rival phosphorylation? , 2000, The EMBO journal.

[33]  A. Tobin,et al.  Location, location, location…site-specific GPCR phosphorylation offers a mechanism for cell-type-specific signalling , 2008, Trends in pharmacological sciences.

[34]  John Ngai,et al.  The cell biology of smell , 2010, The Journal of cell biology.

[35]  Aaron A. Klammer,et al.  Effects of modified digestion schemes on the identification of proteins from complex mixtures. , 2006, Journal of proteome research.

[36]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[37]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[38]  Ruedi Aebersold,et al.  Using the Human Plasma PeptideAtlas to study human plasma proteins. , 2011, Methods in molecular biology.

[39]  Bin Ma,et al.  De Novo Sequencing Methods in Proteomics , 2010, Proteome Bioinformatics.

[40]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[41]  Marc T. Facciotti,et al.  Halobacterium salinarum NRC-1 PeptideAtlas: strategies for targeted proteomics , 2008 .

[42]  J. Yates,et al.  Mass spectrometry accelerates membrane protein analysis. , 2011, Trends in biochemical sciences.

[43]  N. Raje,et al.  Cyclin dependent kinases in cancer , 2012, Cancer biology & therapy.

[44]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[45]  Matthew T. Mazur,et al.  An algorithm for identifying multiply modified endogenous proteins using both full-scan and high-resolution tandem mass spectrometric data. , 2011, Rapid communications in mass spectrometry : RCM.

[46]  J. Shabanowitz,et al.  Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  A. Vertegaal,et al.  Uncovering Ubiquitin and Ubiquitin-like Signaling Networks , 2011, Chemical reviews.

[48]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[49]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[50]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[52]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[53]  J. Corbin,et al.  Mammalian cyclic nucleotide phosphodiesterases: molecular mechanisms and physiological functions. , 2011, Physiological reviews.

[54]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[55]  Yan Fu,et al.  DeltAMT: A Statistical Algorithm for Fast Detection of Protein Modifications From LC-MS/MS Data* , 2011, Molecular & Cellular Proteomics.

[56]  S. Hanash,et al.  The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome , 2012, Nature Biotechnology.

[57]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[58]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.