A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli

While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer), a semi-supervised learning method that uses a curated database of verified transcriptional factor–gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor–gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic–anaerobic shift interface.

[1]  Trey Ideker,et al.  Integrated Assessment and Prediction of Transcription Factor Binding , 2006, PLoS Comput. Biol..

[2]  Raymond Cunin,et al.  The arginine regulon of Escherichia coli: whole-system transcriptome analysis discovers new genes and provides an integrated view of arginine regulation. , 2006, Microbiology.

[3]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[4]  Albert-László Barabási,et al.  Spurious spatial periodicity of co-expression in microarray data due to printing design. , 2003, Nucleic acids research.

[5]  Gábor Balázsi,et al.  Genome-scale identification of conditionally essential genes in E. coli by DNA microarrays. , 2004, Biochemical and biophysical research communications.

[6]  D. Mount,et al.  Identification of high affinity binding sites for LexA which define new DNA damage-inducible genes in Escherichia coli. , 1994, Journal of molecular biology.

[7]  Milton H. Saier,et al.  Functional Interactions between the Carbon and Iron Utilization Regulators, Crp and Fur, in Escherichia coli , 2005, Journal of bacteriology.

[8]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[9]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[10]  Jeffrey Green,et al.  FNR-mediated regulation of hyp expression in Escherichia coli. , 2003, FEMS microbiology letters.

[11]  R. Gunsalus,et al.  Anaerobic regulation of the Escherichia coli dmsABC operon requires the molybdate‐responsive regulator ModE , 1998, Molecular microbiology.

[12]  R Kahmann,et al.  The E.coli fis promoter is subject to stringent control and autoregulation. , 1992, The EMBO journal.

[13]  K. Shanmugam,et al.  Transcriptional regulation of molybdoenzyme synthesis in Escherichia coli in response to molybdenum: ModE-molybdate, a repressor of the modABCD (molybdate transport) operon is a secondary transcriptional activator for the hyc and nar operons. , 1999, Microbiology.

[14]  P. Nygaard,et al.  Evidence for a novel glycinamide ribonucleotide transformylase in Escherichia coli , 1993, Journal of bacteriology.

[15]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[16]  Roland Lange,et al.  Interplay between global regulators of Escherichia coli : effect of RpoS, Lrp and H‐NS on transcription of the gene osmC , 1998, Molecular microbiology.

[17]  I S Roberts,et al.  Structure, assembly and regulation of expression of capsules in Escherichia coli , 1999, Molecular microbiology.

[18]  G. Church,et al.  A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. , 1998, Journal of molecular biology.

[19]  Kevin Struhl,et al.  Copyright © 2004, American Society for Microbiology. All Rights Reserved. Genomic Studies with Escherichia coli MelR Protein: Applications of , 2004 .

[20]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Julio Collado-Vides,et al.  Complementing computationally predicted regulatory sites in Tractor_DB using a pattern matching approach , 2004, Silico Biol..

[22]  Paul Skipp,et al.  A dedicated translation factor controls the synthesis of the global regulator Fis , 2004, The EMBO journal.

[23]  Yasuhiko Sekine,et al.  Involvement of H-NS in Transpositional Recombination Mediated by IS1 , 2001, Journal of bacteriology.

[24]  Peter D. Karp,et al.  The comprehensive updated regulatory network of Escherichia coli K-12 , 2006, BMC Bioinformatics.

[25]  Charles DeLisi,et al.  Machine learning for regulatory analysis and transcription factor target prediction in yeast , 2006, Systems and Synthetic Biology.

[26]  I. Simon,et al.  Reconstructing dynamic regulatory maps , 2007, Molecular systems biology.

[27]  Kagan Tuncay,et al.  Transcriptional regulatory network discovery via multiple method integration: application to e. coli K12 , 2007, Algorithms for Molecular Biology.

[28]  G. Bennett,et al.  Effect of ArcA and FNR on the expression of genes related to the oxygen regulation and the glycolysis pathway in Escherichia coli under microaerobic growth conditions. , 2005, Biotechnology and bioengineering.

[29]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[30]  Svetlana Alexeeva,et al.  Requirement of ArcA for Redox Regulation in Escherichia coli under Microaerobic but Not Anaerobic or Aerobic Conditions , 2003, Journal of bacteriology.

[31]  G. W. Hatfield,et al.  Global Gene Expression Profiling in Escherichia coli K12 , 2003, Journal of Biological Chemistry.

[32]  Pierre Baldi,et al.  Global gene expression profiling in Escherichia coli K12: effects of oxygen availability and ArcA. , 2005, The Journal of biological chemistry.

[33]  Jolyon Holdstock,et al.  Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  N. M. Kredich,et al.  In vitro interactions of CysB protein with the cysK and cysJIH promoter regions of Salmonella typhimurium , 1990, Journal of bacteriology.

[35]  Andrey A Mironov,et al.  Comparative genomic analysis of regulation of anaerobic respiration in ten genomes from three families of gamma-proteobacteria (Enterobacteriaceae, Pasteurellaceae, Vibrionaceae) , 2007, BMC Genomics.

[36]  Yue-qin Tang,et al.  Escherichia coli Transcriptome Dynamics during the Transition from Anaerobic to Aerobic Conditions* , 2006, Journal of Biological Chemistry.

[37]  M H Saier,et al.  In vitro binding of the pleiotropic transcriptional regulatory protein, FruR, to the fru, pps, ace, pts and icd operons of Escherichia coli and Salmonella typhimurium. , 1993, Journal of molecular biology.

[38]  Guido Sanguinetti,et al.  Transition of Escherichia coli from Aerobic to Micro-aerobic Conditions Involves Fast and Slow Reacting Regulatory Components* , 2007, Journal of Biological Chemistry.

[39]  Inna Dubchak,et al.  RegTransBase—a database of regulatory sequences and interactions in a wide range of prokaryotic genomes , 2006, Nucleic Acids Res..

[40]  Thomas D. Schneider,et al.  Computation-Directed Identification of OxyR DNA Binding Sites in Escherichia coli , 2001, Journal of bacteriology.

[41]  George M Church,et al.  Regulatory network of acid resistance genes in Escherichia coli , 2003, Molecular microbiology.

[42]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[43]  E. van Nimwegen,et al.  Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  D. Touati,et al.  Anaerobic activation of arcA transcription in Escherichia coli: roles of Fnr and ArcA , 1994, Molecular microbiology.

[45]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Tim W. Overton,et al.  Microarray analysis of gene regulation by oxygen, nitrate, nitrite, FNR, NarL and NarP during anaerobic growth of Escherichia coli: new insights into microbial physiology. , 2006, Biochemical Society transactions.

[47]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[48]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[49]  S. Teichmann,et al.  Functional determinants of transcription factors in Escherichia coli: protein families and binding sites. , 2003, Trends in genetics : TIG.

[50]  G. Sawers,et al.  A novel mechanism controls anaerobic and catabolite regulation of the Escherichia coli tdc operon , 2001, Molecular microbiology.

[51]  S. Gottesman,et al.  A small RNA acts as an antisilencer of the H-NS-silenced rcsA gene of Escherichia coli. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[53]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[54]  Lesley Griffiths,et al.  A Reassessment of the FNR Regulon and Transcriptomic Analysis of the Effects of Nitrate, Nitrite, NarXL, and NarQP as Escherichia coli K12 Adapts from Aerobic to Anaerobic Growth* , 2006, Journal of Biological Chemistry.

[55]  S. Busby,et al.  Transcription factor distribution in Escherichia coli: studies with FNR protein , 2006, Nucleic acids research.

[56]  Naotake Ogasawara,et al.  Escherichia coli histone-like protein H-NS preferentially binds to horizontally acquired DNA in association with RNA polymerase. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[57]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[58]  L. Reitzer,et al.  ArgR-Independent Induction and ArgR-Dependent Superinduction of the astCADBE Operon in Escherichia coli , 2002, Journal of bacteriology.

[59]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[60]  Kevin Struhl,et al.  Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites. , 2005, Genes & development.

[61]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[62]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[63]  Martin Vingron,et al.  Predicting transcription factor affinities to DNA from a biophysical model , 2007, Bioinform..

[64]  Douwe Molenaar,et al.  Functions of the Membrane-Associated and Cytoplasmic Malate Dehydrogenases in the Citric Acid Cycle ofEscherichia coli , 2000, Journal of bacteriology.

[65]  R. Gourse,et al.  E.coli Fis protein activates ribosomal RNA transcription in vitro and in vivo. , 1990, The EMBO journal.

[66]  S. Busby,et al.  Association of nucleoid proteins with coding and non-coding segments of the Escherichia coli genome , 2006, Nucleic acids research.

[67]  Colin Hughes,et al.  Interaction of the atypical prokaryotic transcription activator FlhD2C2 with early promoters of the flagellar gene hierarchy. , 2002, Journal of molecular biology.

[68]  Frederick R. Blattner,et al.  Genome-Wide Expression Analysis Indicates that FNR of Escherichia coli K-12 Regulates a Large Number of Genes of Unknown Function , 2005, Journal of bacteriology.

[69]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[71]  G. W. Hatfield,et al.  Global gene expression profiling in Escherichia coli K12. The effects of integration host factor. , 2000, The Journal of biological chemistry.