Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles).

Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.

[1]  Mehdi Mesri,et al.  Evolution of clinical proteomics and its role in medicine. , 2011, Journal of proteome research.

[2]  Pierre Baldi,et al.  The Stability and Complexity of Antibody Responses to the Major Surface Antigen of Plasmodium falciparum Are Associated with Age in a Malaria Endemic Area* , 2011, Molecular & Cellular Proteomics.

[3]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[4]  C. Turck,et al.  The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 Study , 2007, Molecular & Cellular Proteomics.

[5]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[6]  Birgit Schilling,et al.  Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance* , 2009, Molecular & Cellular Proteomics.

[7]  William Stafford Noble,et al.  Statistical calibration of the SEQUEST XCorr function. , 2009, Journal of proteome research.

[8]  S. Carr,et al.  Reporting Protein Identification Data , 2006, Molecular & Cellular Proteomics.

[9]  Martin Eisenacher,et al.  Implementing Data Standards: A report on the HUPOPSI Workshop September 2009, Toronto, Canada , 2010, Proteomics.

[10]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[11]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[12]  John R Yates,et al.  Colander: a probability-based support vector machine algorithm for automatic screening for CID spectra of phosphopeptides prior to database search. , 2008, Journal of proteome research.

[13]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[14]  Maureen Kachman,et al.  Validated MALDI-TOF/TOF mass spectra for protein standards , 2007, Journal of the American Society for Mass Spectrometry.

[15]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[16]  R. Aebersold,et al.  mProphet: automated data processing and statistical validation for large-scale SRM experiments , 2011, Nature Methods.

[17]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[18]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[19]  James A Hill,et al.  Proteomics FASTA Archive and Reference Resource , 2008, Proteomics.

[20]  Christoph H Borchers,et al.  Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma , 2009, Nature Biotechnology.

[21]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[22]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[23]  Ralph A Bradshaw Revised draft guidelines for proteomic data publication. , 2005, Molecular & cellular proteomics : MCP.

[24]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[25]  Bingwen Lu,et al.  Automatic validation of phosphopeptide identifications from tandem mass spectra. , 2007, Analytical chemistry.

[26]  Phil Andrews,et al.  Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: the Amsterdam principles. , 2009, Journal of proteome research.

[27]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[28]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[29]  John R. Yates,et al.  Census for Proteome Quantification , 2010, Current protocols in bioinformatics.

[30]  James A Hill,et al.  ProteomeCommons.org collaborative annotation and project management resource integrated with the Tranche repository. , 2010, Journal of proteome research.

[31]  Heidi Zhang,et al.  Integrated pipeline for mass spectrometry-based discovery and confirmation of biomarkers demonstrated in a mouse model of breast cancer. , 2007, Journal of proteome research.

[32]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[33]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[34]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[35]  Ruedi Aebersold,et al.  High-throughput generation of selected reaction-monitoring assays for proteins and proteomes , 2010, Nature Methods.

[36]  Susan E Abbatiello,et al.  Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry. , 2010, Clinical chemistry.

[37]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[38]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[39]  Jayson A. Falkner,et al.  Tranche: decentralized data storage for the proteomics community , 2007 .

[40]  Steven A. Carr,et al.  New Guidelines for Clinical Proteomics Manuscripts , 2008, Molecular & Cellular Proteomics.

[41]  R. Appel,et al.  Guidelines for the next 10 years of proteomics , 2009, Proteomics.

[42]  R. Aebersold,et al.  A High-Confidence Human Plasma Proteome Reference Set with Estimated Concentrations in PeptideAtlas* , 2011, Molecular & Cellular Proteomics.

[43]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[44]  R. Aebersold,et al.  Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data , 2006, Molecular & Cellular Proteomics.

[45]  Rolf Apweiler,et al.  The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible , 2006, Expert review of proteomics.

[46]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[47]  Gilbert S Omenn,et al.  The Human Proteome Organization Plasma Proteome Project pilot phase: Reference specimens, technology platform comparisons, and standardized data submissions and analyses , 2004, Proteomics.