A HUPO test sample study reveals common problems in mass spectrometry-based proteomics

We performed a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in liquid chromatography–mass spectrometry–based proteomics. We distributed an equimolar test sample, comprising 20 highly purified recombinant human proteins, to 27 laboratories. Each protein contained one or more unique tryptic peptides of 1,250 Da to test for ion selection and sampling in the mass spectrometer. Of the 27 labs, members of only 7 labs initially reported all 20 proteins correctly, and members of only 1 lab reported all tryptic peptides of 1,250 Da. Centralized analysis of the raw data, however, revealed that all 20 proteins and most of the 1,250 Da peptides had been detected in all 27 labs. Our centralized analysis determined missed identifications (false negatives), environmental contamination, database matching and curation of protein identifications as sources of problems. Improved search engines and databases are needed for mass spectrometry–based proteomics.

[1]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[2]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[3]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[4]  Xin Yu,et al.  ORFDB: an information resource linking scientific content to a high-quality Open Reading Frame (ORF) collection , 2004, Nucleic Acids Res..

[5]  Robert E. Kearney,et al.  Quantitative Proteomics Analysis of the Secretory Pathway , 2006, Cell.

[6]  L Cortez,et al.  The implementation of accreditation in a chemical laboratory , 1999 .

[7]  M. Mann,et al.  On the Proper Use of Mass Accuracy in Proteomics* , 2007, Molecular & Cellular Proteomics.

[8]  Weida Tong,et al.  Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential. , 2008, Current opinion in biotechnology.

[9]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[10]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[11]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[12]  Tommy Nilsson,et al.  Organellar proteomics to create the cell map. , 2007, Current opinion in cell biology.

[13]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[14]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[15]  C. Turck,et al.  The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 Study , 2007, Molecular & Cellular Proteomics.

[16]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[17]  Herbert Thiele,et al.  P7-S Combining Workflow-Based Project Organization with Protein-Dependant Data Retrieval for the Retrieval of Extensive Proteome Information , 2007 .

[18]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[19]  Tommy Nilsson,et al.  The protein microscope: incorporating mass spectrometry into cell biology , 2007, Nature Methods.

[20]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[21]  James A Hill,et al.  Proteomics FASTA Archive and Reference Resource , 2008, Proteomics.

[22]  Ruedi Aebersold,et al.  Challenges and Opportunities in Proteomics Data Analysis* , 2006, Molecular & Cellular Proteomics.

[23]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[24]  Beatrix Fahnert,et al.  Inclusion bodies: formation and utilisation. , 2004, Advances in biochemical engineering/biotechnology.

[25]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[26]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[27]  Lennart Martens,et al.  Analyzing large-scale proteomics projects with latent semantic indexing. , 2008, Journal of proteome research.

[28]  M. Gorenstein,et al.  Quantitative proteomic analysis by accurate mass retention time pairs. , 2005, Analytical chemistry.

[29]  Masato Kato,et al.  Identification of the dominant translation start site in the attB1 sequence of the pET-DEST42 Gateway vector. , 2006, Protein expression and purification.

[30]  J. Yates,et al.  Proteomics of organelles and large cellular structures , 2005, Nature Reviews Molecular Cell Biology.

[31]  Alexandre V. Podtelejnikov,et al.  Comparison of different search engines using validated MS/MS test datasets , 2005 .

[32]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[33]  Gilbert S Omenn,et al.  An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis , 2005, Proteomics.