Piecing together the structure–function puzzle: Experiences in structure‐based functional annotation of hypothetical proteins

The combination of genomic sequencing with structural genomics has provided a wealth of new structures for previously uncharacterized ORFs, more commonly referred to as hypothetical proteins. This rapid growth has been the direct result of high‐throughput, automated approaches in both the identification of new ORFs and the determination of high‐resolution 3‐D protein structures. A significant bottleneck is reached, however, at the stage of functional annotation in that the assignment of function is not readily automatable. It is often the case that the initial structural analysis at best indicates a functional family for a given hypothetical protein, but further identification of a relevant ligand or substrate is impeded by the diversity of function in a particular structural classification of proteins family, a highly selective and specific ligand‐binding site, or the identification of a novel protein fold. Our approach to the functional annotation of hypothetical proteins relies on the combination of structural information with additional bioinformatics evidence garnered from operon prediction, loose functional information of additional operon members, conservation of catalytic residues, as well as cocrystallization trials and virtual ligand screening. The synthesis of all available information for each protein has permitted the functional annotation of several hypothetical proteins from Escherichia coli and each assignment has been confirmed through generally accepted biochemical methods.

[1]  C. Rice-Evans,et al.  Intracellular metabolism and bioactivity of quercetin and its in vivo metabolites. , 2003, The Biochemical journal.

[2]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[3]  P. Johansson,et al.  Structure and function of Rv0130, a conserved hypothetical protein from Mycobacterium tuberculosis , 2006, Protein science : a publication of the Protein Society.

[4]  Jie Liang,et al.  Inferring functional relationships of proteins from local sequence and spatial surface patterns. , 2003, Journal of molecular biology.

[5]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[6]  M. Suárez,et al.  Purification and biochemical characterization of gentisate 1,2-dioxygenase from Klebsiella pneumoniae M5a1. , 1996, FEMS microbiology letters.

[7]  L. Wieler,et al.  A novel locus of enterocyte effacement (LEE) pathogenicity island inserted at pheV in bovine Shiga toxin-producing Escherichia coli strain O103:H2. , 2001, FEMS microbiology letters.

[8]  Z. Jia,et al.  Structure of the Escherichia coli O157:H7 Heme Oxygenase ChuS in Complex with Heme and Enzymatic Inactivation by Mutation of the Heme Coordinating Residue His-193* , 2006, Journal of Biological Chemistry.

[9]  Liang Tong,et al.  Functional assignment based on structural analysis: Crystal structure of the yggJ protein (HI0303) of Haemophilus influenzae reveals an RNA methyltransferase with a deep trefoil knot , 2003, Proteins.

[10]  Yunje Cho,et al.  Structure-based identification of a novel NTPase from Methanococcus jannaschii , 1999, Nature Structural Biology.

[11]  A. Joachimiak,et al.  Methyltransferase That Modifies Guanine 966 of the 16 S rRNA , 2007, Journal of Biological Chemistry.

[12]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[13]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[14]  D. Shugar,et al.  Halogenated benzimidazoles and benzotriazoles as selective inhibitors of protein kinases CK I and CK II from Saccharomyces cerevisiae and other sources. , 1995, Biochemical and biophysical research communications.

[15]  H. Okamura,et al.  Gene expression in response to anti-tumour intervention by polysaccharide-K (PSK) in colorectal carcinoma cells. , 2004, Oncology reports.

[16]  A. Miele,et al.  The structure of ActVA‐Orf6, a novel type of monooxygenase involved in actinorhodin biosynthesis , 2003, The EMBO journal.

[17]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[18]  Zongchao Jia,et al.  Structural and Biochemical Analysis Reveal Pirins to Possess Quercetinase Activity*[boxs] , 2005, Journal of Biological Chemistry.

[19]  K. Hantke,et al.  Hemin uptake system of Yersinia enterocolitica: similarities with other TonB‐dependent systems in gram‐negative bacteria. , 1992, The EMBO journal.

[20]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[21]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[22]  S. Gerdes,et al.  A Genetic Screen for the Identification of Thiamin Metabolic Genes* , 2004, Journal of Biological Chemistry.

[23]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[24]  G. Gao,et al.  Crystal Structure of Human Pirin , 2004, Journal of Biological Chemistry.

[25]  K. Hasegawa,et al.  Characterization of FMN-dependent NADH-quinone reductase induced by menadione in Escherichia coli. , 1990, Biochimica et biophysica acta.

[26]  T. Poulos Structural biology of heme monooxygenases. , 2005, Biochemical and biophysical research communications.

[27]  Z. Jia,et al.  Structural and biochemical characterization of gentisate 1,2‐dioxygenase from Escherichia coli O157:H7 , 2006, Molecular microbiology.

[28]  K. Timmis,et al.  Biochemical and Genetic Characterization of a Gentisate 1,2-Dioxygenase from Sphingomonas sp. Strain RW5 , 1998, Journal of bacteriology.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  Pei-Wen Chen,et al.  Alterations in Receptor Expression or Agonist Concentration Change the Pathways Gastrin-Releasing Peptide Receptor Uses to Regulate Extracellular Signal-Regulated Kinase , 2004, Molecular Pharmacology.

[31]  L. Wieler,et al.  Description of a 111-kb pathogenicity island (PAI) encoding various virulence features in the enterohemorrhagic E. coli (EHEC) strain RW1374 (O103:H2) and detection of a similar PAI in other EHEC strains of serotype 0103:H2. , 2005, International journal of medical microbiology : IJMM.

[32]  E. Koonin,et al.  Functional implications from crystal structures of the conserved Bacillus subtilis protein Maf with and without dUTP. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[34]  L. Wieler,et al.  Impact of the locus of enterocyte effacement pathogenicity island on the evolution of pathogenic Escherichia coli. , 2004, International journal of medical microbiology : IJMM.

[35]  Z. Jia,et al.  Identification of an ITPase/XTPase in Escherichia coli by structural and biochemical analysis. , 2005, Structure.

[36]  E. Woltering,et al.  A tomato homologue of the human protein PIRIN is induced during programmed cell death , 2001, Plant Molecular Biology.

[37]  C. Rice-Evans,et al.  Modulation of Pro-survival Akt/Protein Kinase B and ERK1/2 Signaling Cascades by Quercetin and Its in Vivo Metabolites Underlie Their Action on Neuronal Viability* , 2003, Journal of Biological Chemistry.

[38]  K. Hantke,et al.  Transport of haemin across the cytoplasmic membrane through a haemin‐specific periplasmic binding‐protein‐dependent transport system in Yersinia enterocolitica , 1994, Molecular microbiology.

[39]  H. Jungwirth,et al.  Diazaborine Resistance in the Yeast Saccharomyces cerevisiae Reveals a Link between YAP1 and the Pleiotropic Drug Resistance Genes PDR1 andPDR3 * , 1997, The Journal of Biological Chemistry.

[40]  B. Dijkstra,et al.  Functional analysis of the copper-dependent quercetin 2,3-dioxygenase. 2. X-ray absorption studies of native enzyme and anaerobic complexes with the substrates quercetin and myricetin. , 2002, Biochemistry.

[41]  Z. Jia,et al.  Identification of an Escherichia coli O157:H7 heme oxygenase with tandem functional repeats. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Tjaard Pijning,et al.  Crystal structure of the copper-containing quercetin 2,3-dioxygenase from Aspergillus japonicus. , 2002, Structure.

[43]  K. H. Kalk,et al.  Anaerobic enzyme⋅substrate structures provide insight into the reaction mechanism of the copper-dependent quercetin 2,3-dioxygenase , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  R. Ho,et al.  Dioxygen Activation by Enzymes with Mononuclear Non-Heme Iron Active Sites. , 1996, Chemical reviews.

[45]  N. Sternberg,et al.  A general genetic approach in Escherichia coli for determining the mechanism(s) of action of tumoricidal agents: application to DMP 840, a tumoricidal agent. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[46]  C. Scheidereit,et al.  The Bcl-3 oncoprotein acts as a bridging factor between NF-κB/Rel and nuclear co-regulators , 1999, Oncogene.

[47]  A. Torres,et al.  Structure of the Shigella dysenteriae haem transport locus and its phylogenetic distribution in enteric bacteria , 1998, Molecular microbiology.

[48]  Zongchao Jia,et al.  Structural and Biochemical Evidence for an Enzymatic Quinone Redox Cycle in Escherichia coli , 2005, Journal of Biological Chemistry.

[49]  F. J. Simpson,et al.  Quercetinase, a dioxygenase containing copper. , 1971, Biochemical and biophysical research communications.

[50]  C. Poh,et al.  Identification of amino acid residues essential for catalytic activity of gentisate 1,2-dioxygenase from Pseudomonas alcaligenes NCIB 9867. , 2001, FEMS microbiology letters.

[51]  M. Ikeda-Saito,et al.  Heme Degradation as Catalyzed by a Recombinant Bacterial Heme Oxygenase (Hmu O) from Corynebacterium diphtheriae * , 1999, The Journal of Biological Chemistry.

[52]  J. Lipscomb,et al.  Gentisate 1,2-dioxygenase from Pseudomonas. Substrate coordination to active site Fe2+ and mechanism of turnover. , 1990, The Journal of biological chemistry.

[53]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[54]  Z. Jia,et al.  Modulator of drug activity B from Escherichia coli: crystal structure of a prokaryotic homologue of DT-diaphorase. , 2006, Journal of molecular biology.

[55]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[56]  J. Dunwell Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins. , 1998, Biotechnology & genetic engineering reviews.

[57]  NMR structure of the hypothetical protein encoded by the YjbJ gene from Escherichia coli , 2002, Proteins.

[58]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[59]  Robert J. Maier,et al.  An NADPH Quinone Reductase of Helicobacter pylori Plays an Important Role in Oxidative Stress Resistance and Host Colonization , 2004, Infection and Immunity.

[60]  J. Köhrle,et al.  Selected Novel Flavones Inhibit the DNA Binding or the DNA Religation Step of Eukaryotic Topoisomerase I (*) , 1996, The Journal of Biological Chemistry.

[61]  L. Wieler,et al.  Dissemination of pheU- and pheV-located genomic islands among enteropathogenic (EPEC) and enterohemorrhagic (EHEC) E. coli and their possible role in the horizontal transfer of the locus of enterocyte effacement (LEE). , 2003, International journal of medical microbiology : IJMM.

[62]  Y. Hihara,et al.  A cyanobacterial gene encoding an ortholog of Pirin is induced under stress conditions , 2004, FEBS letters.

[63]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[64]  D. Timm,et al.  Crystal structure of human homogentisate dioxygenase , 2000, Nature Structural Biology.

[65]  M. Hayashi,et al.  NADPH-specific quinone reductase is induced by 2-methylene-4-butyrolactone in Escherichia coli. , 1996, Biochimica et biophysica acta.

[66]  J. Lipscomb,et al.  Gentisate 1,2-dioxygenase from pseudomonas. Purification, characterization, and comparison of the enzymes from Pseudomonas testosteroni and Pseudomonas acidovorans. , 1990, The Journal of biological chemistry.

[67]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[68]  E. Kremmer,et al.  Identification of Pirin, a Novel Highly Conserved Nuclear Protein* , 1997, The Journal of Biological Chemistry.

[69]  Weidong Tian,et al.  High precision multi-genome scale reannotation of enzyme function by EFICAz , 2006, BMC Genomics.

[70]  Wei‐Chien Huang,et al.  Flavonoids inhibit tumor necrosis factor-alpha-induced up-regulation of intercellular adhesion molecule-1 (ICAM-1) in respiratory epithelial cells through activator protein-1 and nuclear factor-kappaB: structure-activity relationships. , 2004, Molecular pharmacology.

[71]  J. Dunwell,et al.  Evolution of functional diversity in the cupin superfamily. , 2001, Trends in biochemical sciences.