Probability‐based protein identification by searching sequence databases using mass spectrometry data

Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be com pared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability‐based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

[1]  D. Williams,et al.  Peptide sequencing using the combination of edman degradation, carboxypeptidase digestion and fast atom bombardment mass spectrometry. , 1982, Biochemical and biophysical research communications.

[2]  P. Roepstorff,et al.  Proposal for a common nomenclature for sequence ions in mass spectra of peptides. , 1984, Biomedical mass spectrometry.

[3]  J R Yates,et al.  Protein sequencing by tandem mass spectrometry. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Wootton,et al.  Construction of validated, non-redundant composite protein sequence databases. , 1990, Protein engineering.

[5]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[6]  Prof. Dr. Borivoj Keil Specificity of Proteolysis , 1992, Springer Berlin Heidelberg.

[7]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  T. Hunkapiller,et al.  Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[9]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[10]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[11]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[12]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[13]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[14]  I. Papayannopoulos,et al.  The interpretation of collision‐induced dissociation tandem mass spectra of peptides , 1996 .

[15]  A. Burlingame,et al.  Rapid mass spectrometric peptide sequencing and mass matching for characterization of human melanoma proteins isolated by two-dimensional PAGE. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[17]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.

[18]  M J MacCoss,et al.  Direct database searching with MALDI-PSD spectra of peptides. , 1995, Rapid communications in mass spectrometry : RCM.

[19]  D. Pappin,et al.  Peptide-mass fingerprinting as a tool for the rapid identification and mapping of cellular proteins , 1995 .

[20]  A. Bleasby,et al.  Chemistry, Mass Spectrometry and Peptide-Mass Databases: Evolution of Methods for the Rapid Identification and Mapping of Cellular Proteins , 1996 .

[21]  A. Shevchenko,et al.  Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. , 1996, Analytical chemistry.

[22]  M. Wilm,et al.  Analytical properties of the nanoelectrospray ion source. , 1996, Analytical chemistry.

[23]  G. Gonnet,et al.  An algorithm for the identification of proteins using peptides with ragged N‐ or C‐termini generated by sequential endo‐ and exopeptidase digestions , 1998, Electrophoresis.

[24]  J R Yates,et al.  Database searching using mass spectrometry data , 1998, Electrophoresis.

[25]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[26]  A Bairoch,et al.  Multiple parameter cross‐species protein identification using MultiIdent ‐ a world‐wide web accessible tool , 1998, Electrophoresis.

[27]  D. Pappin,et al.  Re‐evaluation of the primary structure of Ralstonia eutropha phasin and implications for polyhydroxyalkanoic acid granule binding , 1999, FEBS letters.

[28]  J. Sgouros,et al.  A human DNA editing enzyme homologous to the Escherichia coli DnaQ/MutD protein , 1999, The EMBO journal.

[29]  W. Blackstock,et al.  Proteomics: quantitative and physical mapping of cellular proteins. , 1999, Trends in biotechnology.