Matching Peptide Sequences with Mass Spectra

We study a method of mapping both mass spectra and sequences to feature vectors and the correlation between them. The method of calculating the feature vector from mass spectra is presented, together with a method for representing sequences. A correlation metric comparing both representations is studied. It shows strong correlation between two representation for the same peptides. It also demostrates that the effect of correlation is increased by using the longer sequences induced from the theoretical mass spectra. The method provides a promising step towards de novo sequencing.

[1]  David Fenyö,et al.  The Biopolymer Markup Language , 1999, Bioinform..

[2]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[3]  K. Biemann Contributions of mass spectrometry to peptide and protein structure. , 1988, Biomedical & environmental mass spectrometry.

[4]  D. Creasy,et al.  Error tolerant searching of uninterpreted tandem mass spectrometry data , 2002, Proteomics.

[5]  Rong Wang,et al.  The need for a public proteomics repository , 2004, Nature Biotechnology.

[6]  Simon J. Gaskell,et al.  The promotion of d-type ions during the low energy collision-induced dissociation of some cysteic acid-containing peptides , 1997 .

[7]  R S Johnson,et al.  Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: differentiation of leucine and isoleucine. , 1987, Analytical chemistry.

[8]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[9]  Vineet Bafna,et al.  On de novo interpretation of tandem mass spectra for peptide identification , 2003, RECOMB '03.

[10]  J. Yates,et al.  Search of sequence databases with uninterpreted high-energy collision-induced dissociation spectra of peptides , 1996, Journal of the American Society for Mass Spectrometry.

[11]  M. Mann,et al.  What does it mean to identify a protein in proteomics? , 2002, Trends in biochemical sciences.

[12]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[13]  C. Bartels Fast algorithm for peptide sequencing by mass spectroscopy. , 1990, Biomedical & environmental mass spectrometry.

[14]  Eugene A. Kapp,et al.  Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. , 2003, Analytical chemistry.

[15]  S. Gaskell,et al.  Influence of cysteine to cysteic acid oxidation on the collision-activated decomposition of protonated peptides: Evidence for intraionic interactions , 1992, Journal of the American Society for Mass Spectrometry.

[16]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[17]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[18]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[19]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[20]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.