Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity.

The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  F. Richards,et al.  Construction of new ligand binding sites in proteins of known structure. I. Computer-aided modeling of sites with pre-defined geometry. , 1991, Journal of molecular biology.

[4]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[5]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[6]  M. Riley,et al.  Gene products of Escherichia coli: sequence comparisons and common ancestries. , 1995, Molecular biology and evolution.

[7]  H. Mewes,et al.  Overview of the yeast genome. , 1997, Nature.

[8]  F M Richards,et al.  Construction of new ligand binding sites in proteins of known structure. II. Grafting of a buried transition metal binding site into Escherichia coli thioredoxin. , 1991, Journal of molecular biology.

[9]  K. Novak The complete genome sequence… , 1998, Nature Medicine.

[10]  S. Oliver,et al.  Erratum: Overview of the yeast genome , 1997, Nature.

[11]  Amos Bairoch,et al.  The PROSITE database, its status in 1995 , 1996, Nucleic Acids Res..

[12]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[13]  NMR structure of oxidized Escherichia coli glutaredoxin: Comparison with reduced E. coli glutaredoxin and functionally related proteins , 1992, Protein science : a publication of the Protein Society.

[14]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[15]  J. Barbé,et al.  Promoter identification and expression analysis of Salmonella typhimurium and Escherichia coli nrdEF operons encoding one of two class I ribonucleotide reductases present in both bacteria , 1996, Molecular microbiology.

[16]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[17]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[18]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[19]  D Fischer,et al.  Assigning amino acid sequences to 3‐dimensional protein folds , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[20]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[21]  H. Eklund,et al.  Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. , 1990, Journal of molecular biology.

[22]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[23]  Fredrik Åslund,et al.  Characterization of Escherichia coli NrdH , 1997, The Journal of Biological Chemistry.

[24]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[25]  T. Attwood,et al.  PRINTS--a protein motif fingerprint database. , 1994, Protein engineering.

[26]  T. Bergman,et al.  Cloning, Overexpression, and Characterization of Glutaredoxin 2, An Atypical Glutaredoxin from Escherichia coli* , 1997, The Journal of Biological Chemistry.

[27]  D. Galas,et al.  A new five-year plan for the U.S. Human Genome Project. , 1993, Science.

[28]  K. Wüthrich,et al.  Structural and functional characterization of the mutant Escherichia coli glutaredoxin (C14----S) and its mixed disulfide with glutathione. , 1992, Biochemistry.

[29]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[30]  John Kuriyan,et al.  Crystal structure of the DsbA protein required for disulphide bond formation in vivo , 1993, Nature.

[31]  R. Seckler,et al.  Efficient catalysis of disulfide formation during protein folding with a single active-site cysteine. , 1995, Journal of molecular biology.

[32]  L Rychlewski,et al.  Secondary structure prediction using segment similarity. , 1997, Protein engineering.

[33]  Terri K. Attwood,et al.  Novel developments with the PRINTS protein fingerprint database , 1997, Nucleic Acids Res..

[34]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[35]  A. Goffeau,et al.  The complete genome sequence of the Gram-positive bacterium Bacillus subtilis , 1997, Nature.

[36]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[37]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[38]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[39]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[40]  E. Koonin Genome sequences: Genome sequence of a model prokaryote , 1997, Current Biology.

[41]  H. Gilbert,et al.  Catalysis of oxidative protein folding by mutants of protein disulfide isomerase with a single active-site cysteine. , 1996, Biochemistry.

[42]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[43]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[44]  R. Gibbs Pressing ahead with human genome sequencing , 1995, Nature Genetics.

[45]  B. Sjöberg,et al.  Generation of the Glycyl Radical of the Anaerobic Escherichia coli Ribonucleotide Reductase Requires a Specific Activating Enzyme (*) , 1995, The Journal of Biological Chemistry.

[46]  S. Benner,et al.  The B12-dependent ribonucleotide reductase from the archaebacterium Thermoplasma acidophila: an evolutionary solution to the ribonucleotide reductase conundrum. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[47]  M. Delseny,et al.  The Arabidopsis thaliana cDNA sequencing projects. , 1997, FEBS Letters.

[48]  T. Attwood,et al.  PRINTS--a database of protein motif fingerprints. , 1994, Nucleic acids research.