Genome-based peptide fingerprint scanning

We have implemented a method that identifies the genomic origins of sample proteins by scanning their peptide-mass fingerprint against the theoretical translation and proteolytic digest of an entire genome. Unlike previously reported techniques, this method requires no predefined ORF or protein annotations. Fixed-size windows along the genome sequence are scored by an equation accounting for the number of matching peptides, the number of missed enzymatic cleavages in each peptide, the number of in-frame stop codons within a window, the adjacency between peptides, and duplicate peptide matches. Statistical significance of matching regions is assessed by comparing their scores to scores from windows matching randomly generated mass data. Tests with samples from Saccharomyces cerevisiae mitochondria and Escherichia coli have demonstrated the ability to produce statistically significant identifications, agreeing with two commonly used programs, peptident and mascot, in 86% of samples analyzed. This genome fingerprint scanning method has the potential to aid in genome annotation, identify proteins for which annotation is incorrect or missing, and handle cases where sequencing errors have caused framing mistakes in the databases. It might also aid in the identification of proteins in which recoding events such as frameshifting or stop-codon read-through have occurred, elucidating alternative translation mechanisms. The prototype is implemented as a client/server pair, allowing the distribution, among a set of cluster nodes, of a single or multiple genomes for concurrent analysis.

[1]  M. Tesar,et al.  Hepatitis A virus polyprotein synthesis initiates from two alternative AUG codons. , 1992, Virology.

[2]  D. Black Protein Diversity from Alternative Splicing A Challenge for Bioinformatics and Post-Genome Biology , 2000, Cell.

[3]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[4]  A. Levinson,et al.  Initiation of translation at internal AUG codons in mammalian cells , 1984, Nature.

[5]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[6]  W P Tate,et al.  Three, four or more: the translational stop signal at length , 1996, Molecular microbiology.

[7]  Chris L. Tang,et al.  Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. , 2001, Genome research.

[8]  P. Mortensen,et al.  Mass spectrometry allows direct identification of proteins in large genomes , 2001, Proteomics.

[9]  W P Tate,et al.  UGA: a dual signal for 'stop' and for recoding in protein synthesis. , 1999, Biochemistry. Biokhimiia.

[10]  P. O’Farrell High resolution two-dimensional electrophoresis of proteins. , 1975, The Journal of biological chemistry.

[11]  B. Chait,et al.  A statistical basis for testing the significance of mass spectrometric protein identification results. , 2000, Analytical chemistry.

[12]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[13]  L. Pon,et al.  Isolation of highly purified mitochondria from Saccharomyces cerevisiae. , 1995, Methods in enzymology.

[14]  J. Choudhary,et al.  Interrogating the human genome using uninterpreted mass spectrometry data , 2001, Proteomics.

[15]  R Driscoll,et al.  Transcriptional slippage occurs during elongation at runs of adenine or thymine in Escherichia coli. , 1990, Nucleic acids research.

[16]  Matthias Mann,et al.  Functional genomics by mass spectrometry , 2000, FEBS letters.

[17]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[18]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[19]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[20]  R S Johnson,et al.  Sherpa: a Macintosh-based expert system for the interpretation of electrospray ionization LC/MS and MS/MS data from protein digests. , 1996, Rapid communications in mass spectrometry : RCM.

[21]  J. F. Atkins,et al.  Recoding: dynamic reprogramming of translation. , 1996, Annual review of biochemistry.

[22]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.