The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition

The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components. We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.

[1]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[2]  Juan Antonio Vizcaíno,et al.  From Peptidome to PRIDE: Public proteomics data migration at a large scale , 2013, Proteomics.

[3]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[4]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[5]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[6]  Vineet Bafna,et al.  Advanced Proteogenomic Analysis Reveals Multiple Peptide Mutations and Complex Immunoglobulin Peptides in Colon Cancer. , 2015, Journal of proteome research.

[7]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[8]  Steven A. Carr,et al.  On Credibility, Clarity, and Compliance , 2015, Molecular & Cellular Proteomics.

[9]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[10]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.

[11]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective , 2014, Biochimica et biophysica acta.

[12]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[13]  Lennart Martens,et al.  PRIDE Inspector: a tool to visualize and validate MS proteomics data , 2011, Nature Biotechnology.

[14]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[15]  Juan Antonio Vizcaíno,et al.  A public repository for mass spectrometry imaging data , 2014, Analytical and Bioanalytical Chemistry.

[16]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[17]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[18]  Martin Eisenacher,et al.  PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets , 2015, Molecular & Cellular Proteomics.

[19]  Jian Wang,et al.  MSPLIT-DIA: sensitive peptide identification for data-independent acquisition , 2015, Nature Methods.

[20]  Samuel H Payne,et al.  Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks* , 2016, Molecular & Cellular Proteomics.

[21]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[22]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[23]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[24]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[25]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[26]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[27]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[28]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[29]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[30]  P. Mallick,et al.  Peptide Identification from Mixture Tandem Mass Spectra* , 2010, Molecular & Cellular Proteomics.