A Fragmentation Event Model for Peptide Identification by Mass Spectrometry

We present in this paper a novel fragmentation event model for peptide identification by tandem mass spectrometry. Most current peptide identification techniques suffer from the inaccuracies in the predicted theoretical spectrum, which is due to insufficient understanding of the ion generation process, especially the b/y ratio puzzle. To overcome this difficulty, we propose a novel fragmentation event model, which is based on the abundance of fragmentation events rather than ion intensities. Experimental results demonstrate that this model helps improve database searching methods. On LTQ data set, when we control the false-positive rate to be 5%, our fragmentation event model has a significantly higher true positive rate (0.83) than SEQUEST (0.73). Comparison with Mascot exhibits similar results, which means that our model can effectively identify the false positive peptide-spectrum pairs reported by SEQUEST and Mascot. This fragmentation event model can also be used to solve the problem of missing peak encountered by De Novo methods. To our knowledge, this is the first time the fragmentation preference for peptide bonds is used to overcome the missing-peak difficulty.

[1]  C. Bartels Fast algorithm for peptide sequencing by mass spectroscopy. , 1990, Biomedical & environmental mass spectrometry.

[2]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[3]  A. Burlingame,et al.  Pattern-based algorithm for peptide sequencing from tandem high energy collision-induced dissociation mass spectra , 1992, Journal of the American Society for Mass Spectrometry.

[4]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[5]  J. Yates Mass spectrometry and the age of the proteome. , 1998, Journal of mass spectrometry : JMS.

[6]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[7]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[8]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[9]  V. Wysocki,et al.  Mobile and localized protons: a framework for understanding peptide dissociation. , 2000, Journal of mass spectrometry : JMS.

[10]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[11]  Mikhail S. Gelfand,et al.  Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors , 2001, Bioinform..

[12]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[13]  T. Speed,et al.  Deriving statistical models for predicting peptide tandem MS product ion intensities. , 2003, Biochemical Society transactions.

[14]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[15]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[16]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[17]  Ting Chen,et al.  A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry , 2003, J. Comput. Biol..

[18]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[19]  Steven P Gygi,et al.  Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2004, Nature Biotechnology.

[20]  Sándor Suhai,et al.  Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage , 2004, Journal of the American Society for Mass Spectrometry.

[21]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[22]  J. Bunkenborg,et al.  Database‐independent, database‐dependent, and extended interpretation of peptide mass spectra in VEMS V2.0 , 2004, Proteomics.

[23]  K. Resing,et al.  Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. , 2004, Analytical chemistry.

[24]  Ting Chen,et al.  Algorithms for de novo peptide sequencing using tandem mass spectrometry , 2004 .

[25]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[26]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[27]  Pavel A. Pevzner,et al.  Peptide sequence tags for fast database search in mass-spectrometry. , 2005 .

[28]  Ting Chen,et al.  A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search , 2005, RECOMB.

[29]  Suhua Chang,et al.  A novel scoring schema for peptide identification by searching protein sequence databases using tandem mass spectrometry data , 2006, BMC Bioinformatics.

[30]  CHUNGONG YU,et al.  An Iterative Algorithm to Quantify Factors Influencing peptide Fragmentation during Tandem Mass Spectrometry , 2007, J. Bioinform. Comput. Biol..

[31]  Rune Matthiesen,et al.  Methods, algorithms and tools in computational proteomics: A practical point of view , 2007, Proteomics.