A Golden Age for Working with Public Proteomics Data

Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature ‘omics’ disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets.

[1]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[2]  Farid Neema,et al.  Data sharing , 1998 .

[3]  Uwe Ohler,et al.  Detecting actively translated open reading frames in ribosome profiling data , 2015, Nature Methods.

[4]  M. Tress,et al.  Analyzing the First Drafts of the Human Proteome , 2014, Journal of proteome research.

[5]  Juan Antonio Vizcaíno,et al.  ms-data-core-api: an open-source, metadata-oriented library for computational proteomics , 2015, Bioinform..

[6]  Marcia McNutt,et al.  Data sharing , 2016, Science.

[7]  Lennart Martens,et al.  Analysis of the resolution limitations of peptide identification algorithms. , 2011, Journal of proteome research.

[8]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[9]  A. Nesvizhskii,et al.  Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. , 2015, Journal of proteome research.

[10]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[11]  David M. Rocke,et al.  Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome , 2016, PloS one.

[12]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[13]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[14]  Henning Hermjakob,et al.  Testing and Validation of Computational Methods for Mass Spectrometry. , 2016, Journal of proteome research.

[15]  Rui Wang,et al.  PRIDE: Quality control in a proteomics data repository , 2012, Database J. Biol. Databases Curation.

[16]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[17]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[18]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[19]  Alfonso Valencia,et al.  Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function , 2012, Molecular biology and evolution.

[20]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[21]  Johannes Griss,et al.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets , 2016, Nature Methods.

[22]  Juan Antonio Vizcaíno,et al.  How to submit MS proteomics data to ProteomeXchange via the PRIDE database , 2014, Proteomics.

[23]  Tomas Kalina,et al.  MetaMass, a tool for meta-analysis of subcellular proteomics data , 2016, Nature Methods.

[24]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[25]  Johannes Griss,et al.  Identifying novel biomarkers through data mining—A realistic scenario? , 2015, Proteomics. Clinical applications.

[26]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[27]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[28]  Albert Sickmann,et al.  Simultaneous Metabolite, Protein, Lipid Extraction (SIMPLEX): A Combinatorial Multimolecular Omics Approach for Systems Biology* , 2016, Molecular & Cellular Proteomics.

[29]  Bernhard Kuster,et al.  Discovery of O-GlcNAc-6-phosphate Modified Proteins in Large-scale Phosphoproteomics Data* , 2012, Molecular & Cellular Proteomics.

[30]  Ying Zhang,et al.  The neXtProt knowledgebase on human proteins: current status , 2014, Nucleic Acids Res..

[31]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[32]  Lennart Martens,et al.  Computational quality control tools for mass spectrometry proteomics , 2017, Proteomics.

[33]  Lennart Martens,et al.  Analyzing large-scale proteomics projects with latent semantic indexing. , 2008, Journal of proteome research.

[34]  Lennart Martens,et al.  PRIDE Inspector: a tool to visualize and validate MS proteomics data , 2011, Nature Biotechnology.

[35]  D. Dash,et al.  Integrated Transcriptomic-Proteomic Analysis Using a Proteogenomic Workflow Refines Rat Genome Annotation* , 2015, Molecular & Cellular Proteomics.

[36]  Lennart Martens,et al.  A posteriori quality control for the curation and reuse of public proteomics data , 2011, Proteomics.

[37]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.

[38]  James C. Wright,et al.  Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow , 2016, Nature Communications.

[39]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[40]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[41]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[42]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[43]  Lennart Martens,et al.  qcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments , 2014, Molecular & Cellular Proteomics.

[44]  Christoph Steinbeck,et al.  Omics Discovery Index - Discovering and Linking Public ‘Omics’ Datasets , 2016, bioRxiv.

[45]  Haixu Tang,et al.  On the privacy risks of sharing clinical proteomics data , 2016, CRI.

[46]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[47]  James T. Elder,et al.  Proteogenomic analysis of psoriasis reveals discordant and concordant changes in mRNA and protein abundance , 2015, Genome Medicine.

[48]  Steven A. Carr,et al.  On Credibility, Clarity, and Compliance , 2015, Molecular & Cellular Proteomics.

[49]  David L Tabb,et al.  Quality assessment for clinical proteomics. , 2013, Clinical biochemistry.

[50]  Ron Edgar,et al.  NCBI Peptidome: a new public repository for mass spectrometry peptide identifications , 2009, Nature Biotechnology.

[51]  James C. Wright,et al.  Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome. , 2011, Genome research.

[52]  Martin Eisenacher,et al.  PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets , 2015, Molecular & Cellular Proteomics.

[53]  Ivan Matic,et al.  Reanalysis of phosphoproteomics data uncovers ADP-ribosylation sites , 2012, Nature Methods.

[54]  Kiyoko F. Aoki-Kinoshita,et al.  Using Databases and Web Resources for Glycomics Research , 2013, Molecular & Cellular Proteomics.

[55]  Salvador Martínez-Bartolomé,et al.  ΔF508 CFTR interactome remodeling promotes rescue of Cystic Fibrosis , 2015, Nature.

[56]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[57]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[58]  Jian Wang,et al.  MSPLIT-DIA: sensitive peptide identification for data-independent acquisition , 2015, Nature Methods.

[59]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[60]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[61]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[62]  J. Vandesompele,et al.  An update on LNCipedia: a database for annotated human lncRNA sequences , 2015, Nucleic Acids Res..

[63]  Steven P. Gygi,et al.  Defining the consequences of genetic variation on a proteome-wide scale , 2016, Nature.

[64]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[65]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[66]  Evan G. Williams,et al.  Systems proteomics of liver mitochondria function , 2016, Science.