Separation of ion types in tandem mass spectrometry data interpretation - a graph-theoretic approach

Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating b- from y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved /spl sim/90% accuracy for separation of b- and y-ions.

[1]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[2]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[3]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[4]  Caroline Peres,et al.  Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris , 2004, Nature Biotechnology.

[5]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[6]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[9]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[10]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[11]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[12]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[13]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[14]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[15]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[16]  Neil Hall,et al.  Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry , 2002, Nature.

[17]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.