Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry.

The complexity of proteomic instrumentation for LC-MS/MS introduces many possible sources of variability. Data-dependent sampling of peptides constitutes a stochastic element at the heart of discovery proteomics. Although this variation impacts the identification of peptides, proteomic identifications are far from completely random. In this study, we analyzed interlaboratory data sets from the NCI Clinical Proteomic Technology Assessment for Cancer to examine repeatability and reproducibility in peptide and protein identifications. Included data spanned 144 LC-MS/MS experiments on four Thermo LTQ and four Orbitrap instruments. Samples included yeast lysate, the NCI-20 defined dynamic range protein mix, and the Sigma UPS 1 defined equimolar protein mix. Some of our findings reinforced conventional wisdom, such as repeatability and reproducibility being higher for proteins than for peptides. Most lessons from the data, however, were more subtle. Orbitraps proved capable of higher repeatability and reproducibility, but aberrant performance occasionally erased these gains. Even the simplest protein digestions yielded more peptide ions than LC-MS/MS could identify during a single experiment. We observed that peptide lists from pairs of technical replicates overlapped by 35-60%, giving a range for peptide-level repeatability in these experiments. Sample complexity did not appear to affect peptide identification repeatability, even as numbers of identified spectra changed by an order of magnitude. Statistical analysis of protein spectral counts revealed greater stability across technical replicates for Orbitraps, making them superior to LTQ instruments for biomarker candidate discovery. The most repeatable peptides were those corresponding to conventional tryptic cleavage sites, those that produced intense MS signals, and those that resulted from proteins generating many distinct peptides. Reproducibility among different instruments of the same type lagged behind repeatability of technical replicates on a single instrument by several percent. These findings reinforce the importance of evaluating repeatability as a fundamental characteristic of analytical technologies.

[1]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[2]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[3]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[4]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[5]  Members of the Complex Trait Consortium,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005 .

[6]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[7]  Eugene A. Kapp,et al.  Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly‐available database , 2005, Proteomics.

[8]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[9]  J M Bland,et al.  Statistical methods for assessing agreement between two methods of clinical measurement , 1986 .

[10]  K. Resing,et al.  Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. , 2004, Analytical chemistry.

[11]  Steven A Carr,et al.  Directed sample interrogation utilizing an accurate mass exclusion-based data-dependent acquisition strategy (AMEx). , 2009, Journal of proteome research.

[12]  D. Tabb,et al.  Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. , 2008, Journal of proteome research.

[13]  Gennifer E. Merrihew,et al.  Post analysis data acquisition for the iterative MS/MS sampling of proteomics mixtures. , 2009, Journal of proteome research.

[14]  Christoph H Borchers,et al.  Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma , 2009, Nature Biotechnology.

[15]  E. Verpoorte,et al.  Improvement of recovery and repeatability in liquid chromatography-mass spectrometry analysis of peptides. , 2007, Journal of proteome research.

[16]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[17]  Helmut E Meyer,et al.  Valid data from large-scale proteomics studies , 2005, Nature Methods.

[18]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[19]  Andrew Wilkinson Compendium of Chemical Terminology , 1997 .

[20]  Benno Schwikowski,et al.  Signal Maps for Mass Spectrometry-based Comparative Proteomics* , 2006, Molecular & Cellular Proteomics.

[21]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[22]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.

[23]  E. Heinzle,et al.  Repeatability of peptide identifications in shotgun proteome analysis employing off-line two-dimensional chromatographic separations and ion-trap MS. , 2009, Journal of separation science.

[24]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[25]  B. Balgley,et al.  Evaluation of confidence and reproducibility in quantitative proteomics performed by a capillary isoelectric focusing‐based proteomic platform coupled with a spectral counting approach , 2008, Electrophoresis.

[26]  Gilbert S Omenn,et al.  An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis , 2005, Proteomics.

[27]  John R Yates,et al.  Reproducibility of quantitative proteomic analyses of complex biological mixtures by multidimensional protein identification technology. , 2003, Analytical chemistry.

[28]  Andrew Emili,et al.  Multidimensional protein identification technology (MudPIT): Technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue , 2005, Journal of the American Society for Mass Spectrometry.

[29]  M. Mann,et al.  Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system , 2006, Genome Biology.

[30]  Steven A Carr,et al.  Protein biomarker discovery and validation: the long and uncertain path to clinical utility , 2006, Nature Biotechnology.

[31]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.