Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry

MOTIVATION The correlation among fragment ions in a tandem mass spectrum is crucial in reducing stochastic mismatches for peptide identification by database searching. Until now, an efficient scoring algorithm that considers the correlative information in a tunable and comprehensive manner has been lacking. RESULTS This paper provides a promising approach to utilizing the correlative information for improving the peptide identification accuracy. The kernel trick, rooted in the statistical learning theory, is exploited to address this issue with low computational effort. The common scoring method, the tandem mass spectral dot product (SDP), is extended to the kernel SDP (KSDP). Experiments on a dataset reported previously demonstrate the effectiveness of the KSDP. The implementation on consecutive fragments shows a decrease of 10% in the error rate compared with the SDP. Our software tool, pFind, using a simple scoring function based on the KSDP, outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy. SUPPLEMENTARY INFORMATION http://www.jdl.ac.cn/user/yfu/pfind/index.html

[1]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[2]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[3]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[4]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[5]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[6]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[7]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[8]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[9]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[10]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[11]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Pavel A. Pevzner,et al.  Mutation-tolerant protein identification by mass-spectrometry , 2000, RECOMB '00.

[14]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[15]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[16]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[17]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[18]  I. Vidavsky,et al.  Comparing similar spectra: From similarity index to spectral contrast angle , 2002, Journal of the American Society for Mass Spectrometry.

[19]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[20]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[21]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[22]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.

[23]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[24]  Bin Ma,et al.  PEAKS: Powerful Software for Peptide De Novo Sequencing by MS/MS , 2003 .