An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates

The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high‐throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false‐positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on‐going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.

[1]  Frederic Nikitin,et al.  QuickMod: A tool for open modification spectrum library searches. , 2011, Journal of proteome research.

[2]  Ruedi Aebersold,et al.  Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. , 2010, Journal of proteome research.

[3]  Douglas J. Baxter,et al.  Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. , 2011, Journal of proteome research.

[4]  Pedro Navarro,et al.  A refined method to calculate false discovery rates for peptide identification using decoy databases. , 2009, Journal of proteome research.

[5]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[6]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[7]  Ruixiang Sun,et al.  Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate , 2010, Bioinform..

[8]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[9]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[10]  S. A. McLuckey,et al.  Collision-induced dissociation (CID) of peptides and proteins. , 2005, Methods in enzymology.

[11]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[12]  Joshua J. Coon,et al.  Post-acquisition ETD spectral processing for increased peptide identifications , 2009, Journal of the American Society for Mass Spectrometry.

[13]  Predrag Radivojac,et al.  A Machine Learning Approach to Predicting Peptide Fragmentation Spectra , 2005, Pacific Symposium on Biocomputing.

[14]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[15]  Mark P. Molloy,et al.  How specific is my SRM?: The issue of precursor and product ion redundancy , 2009, Proteomics.

[16]  Steven P Gygi,et al.  Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. , 2006, Journal of proteome research.

[17]  R. Aebersold,et al.  Selected reaction monitoring for quantitative proteomics: a tutorial , 2008, Molecular systems biology.

[18]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[19]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[20]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[21]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[22]  Beatrix Ueberheide,et al.  Protein identification using sequential ion/ion reactions and tandem mass spectrometry. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[24]  Markus Müller,et al.  Unrestricted identification of modified proteins using MS/MS , 2010, Proteomics.

[25]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[26]  Judith A J Steen,et al.  When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification , 2009, Proteomics.

[27]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[28]  Lewis Y. Geer,et al.  Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry , 2007, Proceedings of the National Academy of Sciences.

[29]  W. McDonald,et al.  MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra , 2005, Journal of the American Society for Mass Spectrometry.

[30]  Suresh Mathivanan,et al.  Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry , 2007, Proceedings of the National Academy of Sciences.

[31]  J. Coon,et al.  The effect of interfering ions on search algorithm performance for electron‐transfer dissociation data , 2010, Proteomics.

[32]  Frederique Lisacek,et al.  A simple workflow to increase MS2 identification rate by subsequent spectral library search , 2009, Proteomics.

[33]  William Stafford Noble,et al.  Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. , 2006, Analytical chemistry.

[34]  Chia-Yu Yen,et al.  Spectrum-to-Spectrum Searching Using a Proteome-wide Spectral Library* , 2011, Molecular & Cellular Proteomics.

[35]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[36]  Predrag Radivojac,et al.  On the accuracy and limits of peptide fragmentation spectrum prediction. , 2011, Analytical chemistry.

[37]  Brian Carrillo,et al.  Methods for peptide identification by spectral comparison , 2007, Proteome Science.

[38]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline supports and improves analysis of electron transfer dissociation data sets , 2010, Proteomics.

[39]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[40]  P. Andrews,et al.  A spectral clustering approach to MS/MS identification of post-translational modifications. , 2008, Journal of proteome research.