Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics.

The challenge of estimating false discovery rates (FDR) in peptide identification from MS/MS spectra has received increased attention in proteomics. The simple approach of target-decoy searching has become popular with traditional sequence (database) searching methods, but has yet to be practiced in spectral (library) searching, an emerging alternative to sequence searching. We extended this target-decoy searching approach to spectral searching by developing and validating a robust method to generate realistic, but unnatural, decoy spectra. Our method involves randomly shuffling the peptide identification of each reference spectrum in the library, and repositioning each fragment ion peak along the m/z axis to match the fragment ions expected from the shuffled sequence. We show that this method produces decoy spectra that are sufficiently realistic, such that incorrect identifications are equally likely to match real and decoy spectra, a key assumption necessary for decoy counting. This approach has been implemented in the open-source library building software, SpectraST.

[1]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[2]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[3]  Qunhua Li,et al.  Modes of inference for evaluating the confidence of peptide identifications. , 2008, Journal of proteome research.

[4]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[5]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[6]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[7]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[8]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[9]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[10]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[11]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[14]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[15]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[16]  Quanhu Sheng,et al.  On the estimation of false positives in peptide identifications using decoy search strategy , 2009, Proteomics.

[17]  P. Andrews,et al.  A spectral clustering approach to MS/MS identification of post-translational modifications. , 2008, Journal of proteome research.

[18]  Eugene A. Kapp,et al.  Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly‐available database , 2005, Proteomics.

[19]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[20]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[21]  Jian Feng,et al.  Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies , 2007, Bioinform..

[22]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[23]  Rob Knight,et al.  A Simulated MS/MS Library for Spectrum-to-spectrum Searching in Large Scale Identification of Proteins*S , 2009, Molecular & Cellular Proteomics.

[24]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[25]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[26]  William Stafford Noble,et al.  Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. , 2006, Analytical chemistry.

[27]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[28]  R. Aebersold,et al.  Mass Spectrometry and Protein Analysis , 2006, Science.

[29]  Ronald J Moore,et al.  Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags. , 2008, Analytical chemistry.