Efficiency of database search for identification of mutated and modified proteins via mass spectrometry.

Although protein identification by matching tandem mass spectra (MS/MS) against protein databases is a widespread tool in mass spectrometry, the question about reliability of such searches remains open. Absence of rigorous significance scores in MS/MS database search makes it difficult to discard random database hits and may lead to erroneous protein identification, particularly in the case of mutated or post-translationally modified peptides. This problem is especially important for high-throughput MS/MS projects when the possibility of expert analysis is limited. Thus, algorithms that sort out reliable database hits from unreliable ones and identify mutated and modified peptides are sought. Most MS/MS database search algorithms rely on variations of the Shared Peaks Count approach that scores pairs of spectra by the peaks (masses) they have in common. Although this approach proved to be useful, it has a high error rate in identification of mutated and modified peptides. We describe new MS/MS database search tools, MS-CONVOLUTION and MS-ALIGNMENT, which implement the spectral convolution and spectral alignment approaches to peptide identification. We further analyze these approaches to identification of modified peptides and demonstrate their advantages over the Shared Peaks Count. We also use the spectral alignment approach as a filter in a new database search algorithm that reliably identifies peptides differing by up to two mutations/modifications from a peptide in a database.

[1]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[2]  Xinghao Chen,et al.  Efficient Branch and Bound Search with Application to Computer-Aided Design , 1995 .

[3]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[4]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[5]  M Wilm,et al.  Peptide Sequencing by Mass Spectrometry for Homology Searches and Cloning of Genes , 1997, Journal of protein chemistry.

[6]  Nicolle H. Packer,et al.  The Importance of Protein Co- and Post-Translational Modifications in Proteome Projects , 1997 .

[7]  M Wilm,et al.  Electrospray mass spectrometry for protein characterization. , 1995, Trends in biochemical sciences.

[8]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[9]  J. Yates,et al.  Automated identification of amino acid sequence variations in proteins by HPLC/microspray tandem mass spectrometry. , 2000, Analytical chemistry.

[10]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[11]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.

[12]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[13]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[14]  Marc R. Wilkins,et al.  Proteome Research: New Frontiers in Functional Genomics , 1997, Principles and Practice.

[15]  E. Shakhnovich,et al.  Hiking in the energy landscape in sequence space: A bumpy road to good folders , 1999, Proteins.

[16]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[17]  Pavel A. Pevzner,et al.  Mutation-tolerant protein identification by mass-spectrometry , 2000, RECOMB '00.