The PRIDE database and related tools and resources in 2019: improving support for quantification data

Abstract The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

[1]  Juan Antonio Vizcaíno,et al.  How to submit MS proteomics data to ProteomeXchange via the PRIDE database , 2014, Proteomics.

[2]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[3]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[4]  Charles E. Cook,et al.  Identifying ELIXIR Core Data Resources , 2016, F1000Research.

[5]  Lennart Martens,et al.  1 SQANTI : extensive characterization of long read transcript sequences for quality control in 1 full-length transcriptome identification and quantification 2 3 , 2017 .

[6]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[7]  Andreas Bender,et al.  Fast, Quantitative and Variant Enabled Mapping of Peptides to Genomes , 2017, Cell systems.

[8]  Juan Antonio Vizcaíno,et al.  ms-data-core-api: an open-source, metadata-oriented library for computational proteomics , 2015, Bioinform..

[9]  Johannes Griss,et al.  Future Prospects of Spectral Clustering Approaches in Proteomics , 2018, Proteomics.

[10]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[11]  Masaki Matsumoto,et al.  jPOSTrepo: an international standard data repository for proteomes , 2016, Nucleic Acids Res..

[12]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[13]  Martin Eisenacher,et al.  In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. , 2017, Journal of proteomics.

[14]  Martin Eisenacher,et al.  PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. , 2015, Journal of proteome research.

[15]  Nuno A. Fonseca,et al.  Expression Atlas: gene and protein expression across multiple studies and organisms , 2017, Nucleic Acids Res..

[16]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[17]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for biological data visualisation: A guide for developers , 2015, Proteomics.

[18]  Johannes Griss,et al.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets , 2016, Nature Methods.

[19]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[20]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[21]  Knut Reinert,et al.  OpenMS - A platform for reproducible analysis of mass spectrometry data. , 2017, Journal of biotechnology.

[22]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[23]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[24]  Martin Eisenacher,et al.  PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets , 2015, Molecular & Cellular Proteomics.

[25]  A. Pain,et al.  Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates. , 2017, Journal of proteome research.

[26]  Subha Madhavan,et al.  The CPTAC Data Portal: A Resource for Cancer Proteomics Research. , 2015, Journal of proteome research.

[27]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[28]  Helmut Krcmar,et al.  ProteomicsDB , 2017, Nucleic Acids Res..

[29]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[30]  Michael J MacCoss,et al.  Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline* , 2018, Molecular & Cellular Proteomics.

[31]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[32]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[33]  Harald Barsnes,et al.  OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets , 2017, Proteomics.

[34]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[35]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[36]  Juan Antonio Vizcaíno,et al.  Introducing the PRIDE Archive RESTful web services , 2015, Nucleic Acids Res..

[37]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[38]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[39]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.