A comprehensive full factorial LC‐MS/MS proteomics benchmark data set

An important prerequisite for the development and benchmarking of novel analysis methods is a well‐designed comprehensive LC‐MS/MS data set. Here, we present our data set consisting of 59 LC‐MS/MS analyses of 50 protein samples extracted individually from Escherichia coli K12 and spiked with different concentrations of bovine carbonic anhydrase II and/or chicken ovalbumin, according to a 2 × 3 full factorial design. Using the well‐annotated and commonly used E. coli proteome as the sample background ensures that the complexity of the data is on a par with most current proteomic analyses. Data were acquired over a 2‐month period using multiple reversed‐phase columns and instrument calibrations to include real‐life challenges faced when analyzing large proteomics data sets. Moreover, so‐called “ground truth” data, comprised by LC‐MS/MS measurements of the pure spikes are included in the data set. The current manuscript elaborates this comprehensive benchmark data set for future development and evaluation of analysis methods and software.

[1]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[2]  D. B. Weatherly,et al.  A Heuristic Method for Assigning a False-discovery Rate for Protein Identifications from Mascot Database Search Results * , 2005, Molecular & Cellular Proteomics.

[3]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[4]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[5]  Joachim M. Buhmann,et al.  Semi-supervised LC/MS alignment for differential proteomics , 2006, ISMB.

[6]  Pietro Franceschi,et al.  A benchmark spike‐in data set for biomarker identification in metabolomics , 2012 .

[7]  Patrick G. A. Pedrioli Trans-Proteomic Pipeline: A Pipeline for Proteomic Analysis , 2010, Proteome Bioinformatics.

[8]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[9]  Hua Tang,et al.  A statistical method for chromatographic alignment of LC-MS data. , 2007, Biostatistics.

[10]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[11]  Knut Reinert,et al.  A geometric approach for the alignment of liquid chromatography - mass spectrometry data , 2007, ISMB/ECCB.

[12]  Birgit Schilling,et al.  Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance* , 2009, Molecular & Cellular Proteomics.

[13]  Richard D. Smith,et al.  Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. , 2006, Analytical chemistry.

[14]  Paul H. C. Eilers,et al.  Improved parametric time warping for proteomics , 2010 .

[15]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[16]  E. Marcotte,et al.  Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. , 2006, Analytical chemistry.

[17]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[18]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[19]  Magnus Palmblad,et al.  Chromatographic alignment of LC-MS and LC-MS/MS datasets by genetic algorithm feature extraction , 2007, Journal of the American Society for Mass Spectrometry.

[20]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[21]  Chih-Chiang Tsou,et al.  IDEAL-Q, an Automated Tool for Label-free Quantitation Analysis Using an Efficient Peptide Alignment Approach and Spectral Data Validation* , 2009, Molecular & Cellular Proteomics.

[22]  Jolein Gloerich,et al.  Liquid chromatography-mass spectrometry-based proteomics of Nitrosomonas. , 2011, Methods in enzymology.

[23]  Ron Wehrens,et al.  Pinpointing biomarkers in proteomic LC/MS data by moving-window discriminant analysis. , 2011, Analytical chemistry.

[24]  Christoph H Borchers,et al.  Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma , 2009, Nature Biotechnology.

[25]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[26]  Qianqian Zhu,et al.  A wholly defined Agilent microarray spike-in dataset , 2011, Bioinform..