Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

We demonstrate a new approach to the identification of mass spectrometrically fragmented peptides. A fragmentation spectrum usually contains a short, easily identifiable series of sequence ions, which yields a partial sequence. This partial sequence divides the peptide into three parts-regions 1, 2, and 3-characterized by the added mass m1 of region 1, the partial sequence of region 2, and the added mass m3 of region 3. We call the construct, m1 partial sequence m3, a "peptide sequence tag" and show that it is a highly specific identifier of the peptide. An algorithm developed here that uses the sequence tag to find the peptide in a sequence database is up to 1 million-fold more discriminating than the partial sequence information alone. Peptides can be identified even in the presence of an unknown posttranslational modification or an amino acid substitution between an entry in the sequence database and the measured peptide. These concepts are demonstrated with model and practical examples of electrospray mass spectrometry/mass spectrometry of tryptic peptides. Just two to three amino acid residues derived by fragmentation are enough to identify these peptides. In peptide mapping applications, even less information is necessary.

[1]  J R Yates,et al.  Protein sequencing by tandem mass spectrometry. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[2]  L. Hood,et al.  A common language for physical mapping of the human genome. , 1989, Science.

[3]  K. Biemann,et al.  Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides. , 1989, Biomedical & environmental mass spectrometry.

[4]  M. F. Bean,et al.  Integration of mass spectrometry in analytical biotechnology. , 1991, Analytical chemistry.

[5]  P. Dupree,et al.  Internal fragmentation of proteins in polyacrylamide matrices for microsequencing , 1992 .

[6]  P. Roepstorff,et al.  Status of, and developments in, mass spectrometry of peptides and proteins* , 1992 .

[7]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  T. Hunkapiller,et al.  Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[9]  R. Henderson,et al.  Recognition of human melanoma cells by HLA-A2.1-restricted cytotoxic T lymphocytes is mediated by at least six shared peptide epitopes. , 1993, Journal of immunology.

[10]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[11]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[12]  J. Craig Venter,et al.  3,400 new expressed sequence tags identify diversity of transcripts in human brain , 1993, Nature Genetics.

[13]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[14]  M. Wilm,et al.  Electrospray and Taylor-Cone theory, Dole's beam of macromolecules at last? , 1994 .

[15]  C. Sander,et al.  From genome sequences to protein function , 1994 .