Improving peptide identification with single-stage mass spectrum peaks

MOTIVATION Database searching is the major peptide identification method in shotgun proteomics. It searches tandem mass spectrometry (MS/MS) spectra against a protein database to identify target peptides. The success of such a database searching method relies on a scoring algorithm that can evaluate the quality of peptide-spectrum matches (PSMs) accurately. However, current scoring algorithms frequently generate inaccurate assignments due to variations and noises in the MS/MS spectra. To address this issue, we like to improve peptide identification by using additional information from other data sources. RESULTS Single-stage MS data is complementary to MS/MS data in the sense that it provides broader mass coverage but less sequence information. In this article, we show that single-stage MS data can be used to re-rank PSMs. The proposed method explores a linear combination of scores between MS and MS/MS data to perform re-ranking. Experimental results on real data show that such a re-ranking strategy improves the identification performance significantly. AVAILABILITY http://bioinformatics.ust.hk/ReRankPSMwMS1.rar

[1]  Ralph E. Steuer,et al.  Multiple Criteria Decision Making, Multiattribute Utility Theory: The Next Ten Years , 1992 .

[2]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[3]  Michael J MacCoss,et al.  Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. , 2007, Analytical chemistry.

[4]  R. Aebersold,et al.  Investigating MS2/MS3 Matching Statistics , 2008, Molecular & Cellular Proteomics.

[5]  Navdeep Jaitly,et al.  VIPER: an advanced software package to support high-throughput LC-MS peptide identification , 2007, Bioinform..

[6]  Jeff A. Bilmes,et al.  Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification , 2008, ISMB.

[7]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[8]  Ari M Frank,et al.  A ranking-based scoring function for peptide-spectrum matches. , 2009, Journal of proteome research.

[9]  Nuno Bandeira,et al.  Multi-spectra peptide sequencing and its applications to multistage mass spectrometry , 2008, ISMB.

[10]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[11]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[12]  John D. Venable,et al.  Improving protein identification sensitivity by combining MS and MS/MS information for shotgun proteomics using LTQ-Orbitrap high mass accuracy data. , 2008, Analytical chemistry.

[13]  Navdeep Jaitly,et al.  Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data , 2009, BMC Bioinformatics.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  Yu Lin,et al.  A Fragmentation Event Model for Peptide Identification by Mass Spectrometry , 2008, RECOMB.

[16]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[17]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[18]  J. Yates,et al.  Probability-based validation of protein identifications using a modified SEQUEST algorithm. , 2002, Analytical chemistry.

[19]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.