The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics

Abstract The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including ‘big data’ approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.

[1]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[2]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[3]  Jian Wang,et al.  Assembling the Community-Scale Discoverable Human Proteome , 2018, Cell systems.

[4]  Helmut Krcmar,et al.  ProteomicsDB , 2017, Nucleic Acids Res..

[5]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[6]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[7]  Aïda Ouangraoua,et al.  OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes , 2018, Nucleic Acids Res..

[8]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[9]  Nuno Bandeira,et al.  ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets. , 2018, Journal of proteome research.

[10]  Lindsay K. Pino,et al.  The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics. , 2020, Mass spectrometry reviews.

[11]  Lennart Martens,et al.  Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques , 2019, Nucleic Acids Res..

[12]  Johannes Griss,et al.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets , 2016, Nature Methods.

[13]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[14]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[15]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[16]  Juan Antonio Vizcaíno,et al.  Introducing the PRIDE Archive RESTful web services , 2015, Nucleic Acids Res..

[17]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[18]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[19]  Nuno A. Fonseca,et al.  Expression Atlas: gene and protein expression across multiple studies and organisms , 2017, Nucleic Acids Res..

[20]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[21]  Roman A. Zubarev,et al.  The SysteMHC Atlas project , 2017, Nucleic Acids Res..

[22]  Jürgen Cox,et al.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis , 2019, Nature Methods.

[23]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[24]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[25]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[26]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[27]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[28]  Kenli Li,et al.  iProX: an integrated proteome resource , 2018, Nucleic Acids Res..

[29]  Juan Antonio Vizcaíno,et al.  The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data , 2017, Genome Biology.

[30]  Lennart Martens,et al.  LNCipedia 5: towards a reference set of human long non-coding RNAs , 2018, Nucleic Acids Res..

[31]  Masaki Matsumoto,et al.  The jPOST environment: an integrated proteomics data repository and database , 2018, Nucleic Acids Res..

[32]  Juan Antonio Vizcaíno,et al.  ms-data-core-api: an open-source, metadata-oriented library for computational proteomics , 2015, Bioinform..

[33]  Robert Petryszak,et al.  Quantifying the impact of public omics data , 2018, Nature Communications.

[34]  Martin Eisenacher,et al.  PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets , 2015, Molecular & Cellular Proteomics.

[35]  Gerben Menschaert,et al.  An update on sORFs.org: a repository of small ORFs identified by ribosome profiling , 2017, Nucleic Acids Res..

[36]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[37]  Astrid Gall,et al.  Ensembl 2019 , 2018, Nucleic Acids Res..

[38]  R. Aebersold,et al.  The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. , 2017, Journal of proteome research.

[39]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[40]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[41]  Michael J MacCoss,et al.  Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline* , 2018, Molecular & Cellular Proteomics.

[42]  Amos Bairoch,et al.  The neXtProt knowledgebase on human proteins: 2017 update , 2016, Nucleic Acids Res..

[43]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.