PRIDE Cluster: building a consensus of proteomics data

To the editor: The amount of mass spectrometry (MS) proteomics data in public repositories is growing rapidly1 but its (re-)use to increase the reliability of newly performed experiments is still limited. Two of the major obstacles are the high heterogeneity of the data present in repositories, and the inflation of false positive identifications when combining datasets. Here we present ‘PRIDE Cluster’: a novel method to identify reliable identifications in heterogeneous MS proteomics experiments. It is used to highlight reliable peptide identifications in the PRIDE database2 (http://www.ebi.ac.uk/pride) and generate constantly updated, reliable spectral libraries based on these identifications.

[1]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[2]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[3]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[4]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[5]  Richard D. Smith,et al.  Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.

[6]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[7]  Lennart Martens,et al.  PRIDE Inspector: a tool to visualize and validate MS proteomics data , 2011, Nature Biotechnology.

[8]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[9]  Eugene A. Kapp,et al.  Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly‐available database , 2005, Proteomics.

[10]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[11]  James A Hill,et al.  Tranche distributed repository and ProteomeCommons.org. , 2011, Methods in molecular biology.

[12]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[13]  Lennart Martens,et al.  HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy , 2006, Proteomics.

[14]  Johannes Griss,et al.  jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats , 2012, Proteomics.

[15]  Pavel A. Pevzner,et al.  Spectral Archives: Extending Spectral Libraries to Analyze both Identified and Unidentified Spectra , 2011, Nature Methods.

[16]  P. Pevzner,et al.  Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong , 2011, Journal of the American Society for Mass Spectrometry.

[17]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[18]  J. Brody,et al.  Comparison of Proteomic and Transcriptomic Profiles in the Bronchial Airway Epithelium of Current and Never Smokers , 2009, PloS one.

[19]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[20]  Rui Wang,et al.  PRIDE: Quality control in a proteomics data repository , 2012, Database J. Biol. Databases Curation.