A fuzzy cluster-based algorithm for peptide identification

Peptide identification is a critical step to understand the proteome in cells and tissue. Typically, high-throughput peptide spectra generated in the MS/MS procedure are searched against real protein sequences by peptide matching. Although a number of automated algorithms have been developed to help identifying those high quality of peptide spectrum matches (PSMs), lack of trustworthy target PSMs remains an open problem. In this paper, we design the FC-Ranker algorithm to calculate the score of each target PSM. A nonnegative weight is assigned to each target PSM to indicate its likelihood of being correct. Particularly, we proposed a fuzzy SVM classification model and a fuzzy silhouette index for iteratively updating the scores of target PSMs. Furthermore, FC-Ranker provides a framework for tackling the problem of uncertainty of target PSMs, and it can be easily adjusted to adapt new datasets.

[1]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[2]  Jennifer A Mead,et al.  Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets. , 2009, Journal of proteome research.

[3]  Li Zhang,et al.  Linear programming support vector machines , 2002, Pattern Recognit..

[4]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[5]  Richard E Higgs,et al.  Estimating the statistical significance of peptide identifications from shotgun proteomics experiments. , 2007, Journal of proteome research.

[6]  Daniel P. Miranker,et al.  A fast coarse filtering method for peptide identification by mass spectrometry , 2006, Bioinform..

[7]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[8]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[9]  Hyungwon Choi,et al.  Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[10]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[11]  Slobodan Petrovic,et al.  A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters , 2006 .

[12]  Hyungwon Choi,et al.  Adaptive discriminant function analysis and reranking of MS/MS database search results for improved peptide identification in shotgun proteomics. , 2008, Journal of proteome research.

[13]  M. Mann,et al.  Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.