Simulating and validating proteomics data and search results

The computational simulation of complete proteomic data sets and their utility to validate detection and interpretation algorithms, to aid in the design of experiments and to assess protein and peptide false discovery rates is presented. The simulation software has been developed for emulating data originating from data‐dependent and data‐independent LC‐MS workflows. Data from all types of commonly used hybrid mass spectrometers can be simulated. The algorithms are based on empirically derived physicochemical liquid and gas phase models for proteins and peptides. Sample composition in terms of complexity and dynamic range, as well as chromatographic, experimental and MS conditions, can be controlled and adjusted independently. The effect of on‐column amounts, gradient length, mass resolution and ion mobility on search specificity will be demonstrated using tryptic peptides from human and yeast cellular lysates simulated over five orders of magnitude in dynamic range. Initial justification of the simulated data sets is achieved by comparing and contrasting the in silico simulated data to experimentally derived results from a 48 protein mixture, spanning a similar magnitude of five orders of magnitude. Additionally, experimental data from replicate and dilutions series experiments will be utilized to determine error rates at the peptide and protein level with respect to mass, area, retention and drift time. The data presented reveal a high degree of similarity at the ion detection, peptide and protein level when analyzed under similar conditions.

[1]  Jim Graham,et al.  Using statistical image models for objective evaluation of spot detection in two‐dimensional gels , 2003, Proteomics.

[2]  Ruedi Aebersold,et al.  Options and considerations when selecting a quantitative proteomics strategy , 2010, Nature Biotechnology.

[3]  N. Ahn,et al.  Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. , 2010, Journal of proteome research.

[4]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[5]  Ishtiaq Rehman,et al.  iTRAQ underestimation in simple and complex mixtures: "the good, the bad and the ugly". , 2009, Journal of proteome research.

[6]  Yehia M. Ibrahim,et al.  Characterization of an Ion Mobility-Multiplexed Collision Induced Dissociation-Tandem Time-of-Flight Mass Spectrometry Approach. , 2010, International journal of mass spectrometry.

[7]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[8]  Lennart Martens,et al.  Peptide and protein quantification: A map of the minefield , 2010, Proteomics.

[9]  Jonas Grossmann,et al.  Implementation and evaluation of relative and absolute quantification in shotgun proteomics with label-free methods. , 2010, Journal of proteomics.

[10]  M. Imieliński,et al.  In Situ Proteomic Analysis of Human Breast Cancer Epithelial Cells Using Laser Capture Microdissection: Annotation by Protein Set Enrichment Analysis and Gene Ontology* , 2010, Molecular & Cellular Proteomics.

[11]  Mark P. Molloy,et al.  How specific is my SRM?: The issue of precursor and product ion redundancy , 2009, Proteomics.

[12]  John R Yates,et al.  Analysis of quantitative proteomic data generated via multidimensional protein identification technology. , 2002, Analytical chemistry.

[13]  Mikhail V Gorshkov,et al.  Liquid chromatography at critical conditions: comprehensive approach to sequence-dependent retention time prediction. , 2006, Analytical chemistry.

[14]  M. Mann,et al.  Proteomics on an Orbitrap Benchtop Mass Spectrometer Using All-ion Fragmentation , 2010, Molecular & Cellular Proteomics.

[15]  R. Bateman,et al.  Applications of a travelling wave-based radio-frequency-only stacked ring ion guide. , 2004, Rapid communications in mass spectrometry : RCM.

[16]  J. Coon,et al.  Value of using multiple proteases for large-scale mass spectrometry-based proteomics. , 2010, Journal of proteome research.

[17]  Magnus Palmblad,et al.  Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. , 2002, Analytical chemistry.

[18]  M. Mann,et al.  Precision proteomics: The case for high resolution and high mass accuracy , 2008, Proceedings of the National Academy of Sciences.

[19]  J. Langridge,et al.  A novel precursor ion discovery method on a hybrid quadrupole orthogonal acceleration time-of-flight (Q-TOF) mass spectrometer for studying protein phosphorylation , 2002, Journal of the American Society for Mass Spectrometry.

[20]  Lennart Martens,et al.  Analyzing large-scale proteomics projects with latent semantic indexing. , 2008, Journal of proteome research.

[21]  M. Gorenstein,et al.  The detection, correlation, and comparison of peptide precursor and product ions from data independent LC‐MS with data dependant LC‐MS/MS , 2009, Proteomics.

[22]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[23]  O. Krokhin,et al.  Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. , 2006, Analytical chemistry.

[24]  Gennifer E. Merrihew,et al.  Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. , 2010, Analytical chemistry.

[25]  B. Searle Scaffold: A bioinformatic tool for validating MS/MS‐based proteomic studies , 2010, Proteomics.

[26]  John P Cortens,et al.  Use of peptide retention time prediction for protein identification by off-line reversed-phase HPLC-MALDI MS/MS. , 2006, Analytical chemistry.

[27]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[28]  Robert A. Grothe,et al.  Precursor-ion mass re-estimation improves peptide identification on hybrid instruments. , 2008, Journal of proteome research.

[29]  Martin Gilar,et al.  Peptide retention prediction applied to proteomic data analysis. , 2007, Rapid communications in mass spectrometry : RCM.

[30]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[31]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[32]  Johannes P C Vissers,et al.  The use of proteome similarity for the qualitative and quantitative profiling of reperfused myocardium. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[33]  Hongwei Xie,et al.  Utility of retention prediction model for investigation of peptide separation selectivity in reversed-phase liquid chromatography: impact of concentration of trifluoroacetic acid, column temperature, gradient slope and type of stationary phase. , 2010, Analytical chemistry.

[34]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[35]  Birgit Schilling,et al.  Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance* , 2009, Molecular & Cellular Proteomics.

[36]  G. Horgan,et al.  Sample size and replication in 2D gel electrophoresis studies. , 2007, Journal of proteome research.

[37]  D. Fenyö,et al.  Improving the success rate of proteome analysis by modeling protein-abundance distributions and experimental designs , 2007, Nature Biotechnology.

[38]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[39]  Richard D. Smith,et al.  Utility of accurate mass tags for proteome-wide protein identification. , 2000, Analytical chemistry.

[40]  N. Karp,et al.  Experimental and Statistical Considerations to Avoid False Conclusions in Proteomics Studies Using Differential In-gel Electrophoresis*S , 2007, Molecular & Cellular Proteomics.

[41]  Dan Golick,et al.  Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures , 2009, Proteomics.

[42]  Samuel I. Miller,et al.  Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. , 2009, Analytical chemistry.

[43]  Richard D. Smith,et al.  Proteomic analyses using an accurate mass and time tag strategy. , 2004, BioTechniques.

[44]  J. Garin,et al.  Influence of mass resolution on species matching in accurate mass and retention time (AMT) tag proteomics experiments. , 2008, Rapid communications in mass spectrometry : RCM.

[45]  M. McKay,et al.  Unique Ion Signature Mass Spectrometry, a Deterministic Method to Assign Peptide Identity , 2009, Molecular & Cellular Proteomics.

[46]  I. A. Tarasova,et al.  Empirical approach to false discovery rate estimation in shotgun proteomics. , 2010, Rapid communications in mass spectrometry : RCM.

[47]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[48]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[49]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[50]  B. Kuster,et al.  Proteomics: a pragmatic perspective , 2010, Nature Biotechnology.

[51]  Helmut E Meyer,et al.  Valid data from large-scale proteomics studies , 2005, Nature Methods.

[52]  M. MacCoss,et al.  High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. , 2007, Analytical chemistry.

[53]  P. Mallick,et al.  Peptide Identification from Mixture Tandem Mass Spectra* , 2010, Molecular & Cellular Proteomics.

[54]  G. Anderson,et al.  High-mass-measurement accuracy and 100% sequence coverage of enzymatically digested bovine serum albumin from an ESI-FTICR mass spectrum. , 1999, Analytical chemistry.

[55]  M. Gorenstein,et al.  Quantitative proteomic analysis by accurate mass retention time pairs. , 2005, Analytical chemistry.