The ProteomeXchange consortium at 10 years: 2023 update

Abstract Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.

[1]  Benjamin M. Gyori,et al.  Unifying the identification of biomedical entities with the Bioregistry , 2022, bioRxiv.

[2]  Andrew R. Jones,et al.  Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future , 2022, Proteomics.

[3]  Jonathan M. Mudge,et al.  Standardized annotation of translated open reading frames , 2022, Nature Biotechnology.

[4]  Andrew R. Jones,et al.  Method for Independent Estimation of the False Localization Rate for Phosphoproteomics , 2022, Journal of proteome research.

[5]  R. Moritz,et al.  The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community , 2022, Scientific data.

[6]  Yun-ping Zhu,et al.  iProX in 2021: connecting proteomics data sharing with big data , 2021, Nucleic Acids Res..

[7]  J. Marioni,et al.  Expression Atlas update: gene and protein expression in multiple species , 2021, Nucleic Acids Res..

[8]  B. Rost,et al.  ProteomicsDB: toward a FAIR open-source resource for life-science research , 2021, Nucleic Acids Res..

[9]  Lauren A. Fromont,et al.  The European Genome-phenome Archive in 2021 , 2021, Nucleic Acids Res..

[10]  James E. Allen,et al.  Ensembl 2022 , 2021, Nucleic Acids Res..

[11]  T. Okido,et al.  DNA Data Bank of Japan (DDBJ) update report 2021 , 2021, Nucleic Acids Res..

[12]  A. Brazma,et al.  The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences , 2021, Nucleic Acids Res..

[13]  Jairo Navarro Gonzalez,et al.  The UCSC Genome Browser database: 2022 update , 2021, Nucleic Acids Res..

[14]  Anthony J. Cesnik,et al.  Proteomics Standards Initiative’s ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms , 2021, Journal of proteome research.

[15]  P. Moreno,et al.  Integrated view and comparative analysis of baseline protein expression in mouse and rat tissues , 2021, bioRxiv.

[16]  Thomas M. Keane,et al.  The growing need for controlled data access models in clinical proteomics and metabolomics , 2021, Nature Communications.

[17]  Andrew R. Jones,et al.  Integrated View of Baseline Protein Expression in Human Tissues , 2021, bioRxiv.

[18]  K. V. van Wijk,et al.  The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. , 2021, The Plant cell.

[19]  Maximilian T. Strauss,et al.  Artificial intelligence for proteomics and biomarker discovery. , 2021, Cell systems.

[20]  P. Moreno,et al.  Implementing the reuse of public DIA proteomics datasets: from the PRIDE database to Expression Atlas , 2021, Scientific Data.

[21]  A. Brazma,et al.  A proteomics sample metadata representation for multiomics integration and big data analysis , 2021, Nature Communications.

[22]  E. Deutsch,et al.  Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future , 2021, Molecular & cellular proteomics : MCP.

[23]  Wout Bittremieux,et al.  Universal Spectrum Identifier for mass spectra , 2020, Nature Methods.

[24]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[25]  Aïda Ouangraoua,et al.  OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes , 2020, Nucleic Acids Res..

[26]  Henning Hermjakob,et al.  Identifiers.org: Compact Identifier services in the cloud , 2020, Bioinform..

[27]  E. Deutsch,et al.  A wide-ranging Pseudomonas aeruginosa PeptideAtlas build: a useful proteomic resource for a versatile pathogen , 2020, bioRxiv.

[28]  Rebekah L. Gundry,et al.  A high-stringency blueprint of the human proteome , 2020, Nature Communications.

[29]  Robert L Moritz,et al.  DIALib-QC an assessment tool for spectral libraries in data-independent acquisition proteomics , 2020, Nature Communications.

[30]  Yasset Perez-Riverol,et al.  MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets , 2020, Nature Methods.

[31]  Amos Bairoch,et al.  The neXtProt knowledgebase in 2020: data, tools and usability improvements , 2019, Nucleic Acids Res..

[32]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[33]  Radka Svobodová Vareková,et al.  PDBe: improved findability of macromolecular structure data in the PDB , 2019, Nucleic Acids Res..

[34]  Yasset Perez-Riverol,et al.  The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics , 2019, Nucleic Acids Res..

[35]  K. Clauser,et al.  MatrisomeDB: the ECM-protein knowledge database , 2019, Nucleic Acids Res..

[36]  M. Schrader,et al.  Co-regulation map of the human proteome enables identification of protein functions , 2019, Nature Biotechnology.

[37]  Lennart Martens,et al.  Scop3P: a comprehensive resource of human phosphosites within their full context , 2019, bioRxiv.

[38]  Andrew F. Jarnuczak,et al.  An integrated landscape of protein expression in human cancer , 2019, bioRxiv.

[39]  Juan Antonio Vizcaíno,et al.  The functional landscape of the human phosphoproteome , 2019, Nature Biotechnology.

[40]  Lennart Martens,et al.  LNCipedia 5: towards a reference set of human long non-coding RNAs , 2018, Nucleic Acids Res..

[41]  Masaki Matsumoto,et al.  The jPOST environment: an integrated proteomics data repository and database , 2018, Nucleic Acids Res..

[42]  Robert Petryszak,et al.  Quantifying the impact of public omics data , 2018, Nature Communications.

[43]  Henning Hermjakob,et al.  Identifiers.org Compact Identifier services , 2018 .

[44]  Jian Wang,et al.  Assembling the Community-Scale Discoverable Human Proteome , 2018, Cell systems.

[45]  Michael J MacCoss,et al.  Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline* , 2018, Molecular & Cellular Proteomics.

[46]  Roman A. Zubarev,et al.  The SysteMHC Atlas project , 2017, Nucleic Acids Res..

[47]  Lindsay K. Pino,et al.  The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics. , 2020, Mass spectrometry reviews.

[48]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[49]  Harald Barsnes,et al.  The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics* , 2017, Molecular & Cellular Proteomics.

[50]  Masaki Matsumoto,et al.  jPOSTrepo: an international standard data repository for proteomes , 2016, Nucleic Acids Res..

[51]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[52]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[53]  Lennart Martens,et al.  sORFs.org: a repository of small ORFs identified by ribosome profiling , 2015, Nucleic Acids Res..

[54]  Martin Eisenacher,et al.  PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets , 2015, Molecular & Cellular Proteomics.

[55]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[56]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[57]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[58]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[59]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[60]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.

[61]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[62]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[63]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.