Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases.

The correlation of uninterpreted tandem mass spectra of modified and unmodified peptides, produced under low-energy (10-50 eV) collision conditions, with nucleotide sequences is demonstrated. In this method nucleotide databases are translated in six reading frames, and the resulting amino acid sequences are searched "on the fly" to identify and fit linear sequences to the fragmentation patterns observed in the tandem mass spectra of peptides. A cross-correlation function is then used to provide a measurement of similarity between the mass-to-charge ratios for the fragment ions predicted by amino acid sequences translated from the nucleotide database and the fragment ions observed in the tandem mass spectrum. In general, a difference greater than 0.1 between the normalized cross-correlation functions for the first- and second-ranked search results indicates a successful match between sequence and spectrum. Measurements of the deviation from maximum similarity employing the spectral reconstruction method are made. The search method employing nucleotide databases is also demonstrated on the spectra of phosphorylated peptides. Specific sites of modification are identified even though no specific information relevant to sites of modification is contained in the character-based sequence information of nucleotide databases.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[3]  L. Hood,et al.  Structural analysis of proteins by capillary HPLC electrospray tandem mass spectrometry , 1991 .

[4]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[5]  A. Coulson,et al.  The genome of the nematode Caenorhabditis elegans. , 1993, Cold Spring Harbor symposia on quantitative biology.

[6]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[7]  J. Shabanowitz,et al.  Mass spectrometry of proteins and peptides: sensitive and accurate mass measurement and sequence analysis. , 1993, Clinical chemistry.

[8]  J. Craig Venter,et al.  Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library , 1993, Nature Genetics.

[9]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  T. Hunkapiller,et al.  Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[11]  M V Olson,et al.  The human genome project. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[13]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[14]  J. Craig Venter,et al.  3,400 new expressed sequence tags identify diversity of transcripts in human brain , 1993, Nature Genetics.

[15]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[16]  L. Hood,et al.  Sequence length and error analysis of Sequenase and automated Taq cycle sequencing methods. , 1993, BioTechniques.

[17]  M. Aigle,et al.  Complete DNA sequence of yeast chromosome II. , 1994, The EMBO journal.

[18]  L. Hood,et al.  Striking sequence similarity over almost 100 kilobases of human and mouse T–cell receptor DNA , 1994, Nature Genetics.

[19]  C. Sensen,et al.  Complete DNA sequence of yeast chromosome XI , 1994, Nature.

[20]  cDNA sequences. HGS opens its databanks--for a price. , 1994, Science.

[21]  Heidi J. Sofia,et al.  Analysis of the Escherichia coli genome. V. DNA sequence of the region from 76.0 to 81.5 minutes , 1993, Nucleic Acids Res..

[22]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[23]  C. Sander,et al.  From genome sequences to protein function , 1994 .

[24]  Isolation of unknown genes from human bone marrow by differential screening and single-pass cDNA sequence determination. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Yates,et al.  Sequencing Peptides Derived from the Class II Major Histocompatibility Complex by Tandem Mass Spectrometry , 1994 .

[26]  The company that genome researchers love to hate. , 1994, Science.

[27]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[28]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.