Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.

[1]  Steven P Gygi,et al.  Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2004, Nature Biotechnology.

[2]  Roger E. Moore,et al.  Qscore: An algorithm for evaluating SEQUEST database search results , 2002, Journal of the American Society for Mass Spectrometry.

[3]  Tohru Natsume,et al.  [Human proteome]. , 2005, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[4]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[5]  K. Resing,et al.  Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. , 2004, Analytical chemistry.

[6]  Mikhail M Savitski,et al.  Improving Protein Identification Using Complementary Fragmentation Techniques in Fourier Transform Mass Spectrometry* , 2005, Molecular & Cellular Proteomics.

[7]  Andrew Emili,et al.  PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals*S , 2003, Molecular & Cellular Proteomics.

[8]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[9]  Brendan K Faherty,et al.  Optimization and Use of Peptide Mass Measurement Accuracy in Shotgun Proteomics*S , 2006, Molecular & Cellular Proteomics.

[10]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[11]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[12]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[13]  Steven P Gygi,et al.  Large-scale characterization of HeLa cell nuclear phosphoproteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  Steven P Gygi,et al.  Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. , 2006, Journal of proteome research.

[16]  Eugene Kolker,et al.  Randomized sequence databases for tandem mass spectrometry peptide and protein identification. , 2005, Omics : a journal of integrative biology.

[17]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[18]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[19]  Matthew E Monroe,et al.  Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. , 2005, Journal of proteome research.

[20]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[21]  Yingming Zhao,et al.  Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. , 2005, Journal of proteome research.

[22]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.