ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets.

High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% of all proteins predicted to result in translated gene products in the human genome. In fact, the galloping rate of data acquisition and sharing of mass spectrometry data has led to the current availability of many tens of terabytes of public data in thousands of human data sets. The systematic reanalysis of these public data sets has been used to build a community-scale spectral library of 2.1 million precursors for over 1 million unique sequences from over 19,000 proteins (including spectra of synthetic peptides). However, it has remained challenging to find and inspect spectra of peptides covering functional protein regions or matching novel proteins. ProteinExplorer addresses these challenges with an intuitive interface mapping tens of millions of identifications to functional sites on nearly all human proteins while maintaining provenance for every identification back to the original data set and data file. Additionally, ProteinExplorer facilitates the selection and inspection of HPP-compliant peptides whose spectra can be matched to spectra of synthetic peptides and already includes HPP-compliant evidence for 107 missing (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows users to rate spectra and to contribute to a community library of peptides entitled PrEdict (Protein Existance dictionary) mapping to novel proteins but whose preliminary identities have not yet been fully established with community-scale false discovery rates and synthetic peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp .

[1]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[2]  P. Pevzner,et al.  False discovery rates of protein identifications: a strike against the two-peptide rule. , 2009, Journal of proteome research.

[3]  R. Aebersold,et al.  Mass spectrometry-based proteomics and network biology. , 2012, Annual review of biochemistry.

[4]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[5]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[6]  Samuel H. Payne,et al.  Proteogenomic strategies for identification of aberrant cancer peptides using large‐scale next‐generation sequencing data , 2014, Proteomics.

[7]  M. Mann,et al.  Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. , 2014, Cell reports.

[8]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[9]  Quanhui Wang,et al.  Chromosome-8-coded proteome of Chinese Chromosome Proteome Data set (CCPD) 2.0 with partial immunohistochemical verifications. , 2014, Journal of proteome research.

[10]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[11]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[12]  Mathias Wilhelm,et al.  A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets , 2015, Molecular & Cellular Proteomics.

[13]  Vineet Bafna,et al.  Advanced Proteogenomic Analysis Reveals Multiple Peptide Mutations and Complex Immunoglobulin Peptides in Colon Cancer. , 2015, Journal of proteome research.

[14]  Cathy H. Wu,et al.  UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Research.

[15]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[16]  Yuri A. Mirokhin,et al.  A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline. , 2016, Journal of proteome research.

[17]  Thibault Robin,et al.  Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. , 2016, Journal of proteome research.

[18]  Amos Bairoch,et al.  The neXtProt knowledgebase on human proteins: 2017 update , 2016, Nucleic Acids Res..

[19]  Mathias Wilhelm,et al.  Building ProteomeTools based on a complete synthetic human proteome , 2017, Nature Methods.

[20]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[21]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[22]  Lydie Lane,et al.  Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. , 2017, Journal of proteome research.

[23]  Jian Wang,et al.  Assembling the Community-Scale Discoverable Human Proteome , 2018, Cell systems.

[24]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..