An adaptive classification model for peptide identification

BackgroundPeptide sequence assignment is the central task in protein identification with MS/MS-based strategies. Although a number of post-database search algorithms for filtering target peptide spectrum matches (PSMs) have been developed, the discrepancy among the output PSMs is usually significant, remaining a few disputable PSMs. Current studies show that a number of target PSMs which are close to decoy PSMs can hardly be separated from those decoys by only using the discrimination function.ResultsIn this paper, we assign each target PSM a weight showing its possibility of being correct. We employ a SVM-based learning model to search the optimal weight for each target PSM and develop a new score system, CRanker, to rank all target PSMs. Due to the large PSM datasets generated in routine database searches, we use the Cholesky factorization technique for storing a kernel matrix to reduce the memory requirement.ConclusionsCompared with PeptideProphet and Percolator, CRanker has identified more PSMs under similar false discover rates over different datasets. CRanker has shown consistent performance on different test sets, validated the reasonability the proposed model.

[1]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[2]  J. Yates,et al.  Direct analysis of protein complexes using mass spectrometry , 1999, Nature Biotechnology.

[3]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[4]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[5]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[6]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[7]  Huiru Zheng,et al.  Integrating domain similarity to improve protein complexes identification in TAP-MS data , 2013, Proteome Science.

[8]  Richard E Higgs,et al.  Estimating the statistical significance of peptide identifications from shotgun proteomics experiments. , 2007, Journal of proteome research.

[9]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[10]  Hyungwon Choi,et al.  Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[11]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[12]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[13]  J. L. Jennings,et al.  A novel algorithm for validating peptide identification from a shotgun proteomics search engine. , 2013, Journal of proteome research.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[16]  Hongwei Zhang,et al.  Peptide identification based on fuzzy classification and clustering , 2013, Proteome Science.

[17]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[18]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[19]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[20]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[21]  Ruedi Aebersold,et al.  Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. , 2010, Journal of proteome research.

[22]  Youhe Gao,et al.  Unrestrictive identification of post-translational modifications in the urine proteome without enrichment , 2013, Proteome Science.

[23]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[24]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..