论文信息 - Separation of ion types in tandem mass spectrometry data interpretation - a graph-theoretic approach

Separation of ion types in tandem mass spectrometry data interpretation - a graph-theoretic approach

Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating b- from y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved /spl sim/90% accuracy for separation of b- and y-ions.

[1] Pavel A. Pevzner,et al. De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[2] R. Aebersold,et al. Mass spectrometry-based proteomics , 2003, Nature.

[3] Ming-Yang Kao,et al. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[4] Caroline Peres,et al. Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris , 2004, Nature Biotechnology.

[5] Ming Li,et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[6] M. Wilm,et al. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[7] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8] Peter R. Baker,et al. Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[9] B. Chait,et al. Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[10] J. Yates,et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[11] J. A. Taylor,et al. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[12] Gary D Bader,et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[13] M. Mann,et al. Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[14] J. Yates,et al. Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[15] J. Yates,et al. Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[16] Neil Hall,et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry , 2002, Nature.

[17] D. N. Perkins,et al. Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.