Peptide Reranking with Protein-Peptide Correspondence and Precursor Peak Intensity Information

Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.

[1]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[2]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[3]  Terry M. Therneau,et al.  Regression analysis for comparing protein samples with 16O/18O stable-isotope labeled mass spectrometry , 2006, Bioinform..

[4]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[5]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[6]  Douglas J. Baxter,et al.  Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. , 2011, Journal of proteome research.

[7]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[8]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[9]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[10]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[11]  Zengyou He,et al.  Improving peptide identification with single-stage mass spectrum peaks , 2009, Bioinform..

[12]  David L Tabb,et al.  What's driving false discovery rates? , 2008, Journal of proteome research.

[13]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[14]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[15]  Ari M Frank,et al.  A ranking-based scoring function for peptide-spectrum matches. , 2009, Journal of proteome research.

[16]  Can Yang,et al.  A regularized method for peptide quantification. , 2010, Journal of proteome research.

[17]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[18]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[19]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[20]  Patrick G. A. Pedrioli Trans-Proteomic Pipeline: A Pipeline for Proteomic Analysis , 2010, Proteome Bioinformatics.

[21]  A. Rockwood,et al.  Efficient calculation of accurate masses of isotopic peaks , 2006, Journal of the American Society for Mass Spectrometry.

[22]  Yi-Kuo Yu,et al.  Enhancing Peptide Identification Confidence by Combining Search Methods , 2008, Journal of proteome research.

[23]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[24]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[25]  Zengyou He,et al.  Score regularization for peptide identification , 2011, BMC Bioinformatics.

[26]  Richard D. Smith,et al.  Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. , 2004, Journal of proteome research.

[27]  Michael J MacCoss,et al.  Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. , 2007, Analytical chemistry.

[28]  Hyungwon Choi,et al.  MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. , 2011, Journal of proteome research.

[29]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[30]  Peicheng Du,et al.  Automatic deconvolution of isotope-resolved mass spectra using variable selection and quantized peptide mass distribution. , 2006, Analytical chemistry.