Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors.

[1]  David J. States,et al.  Conformational model for binding site recognition by the E.coli MetJ transcription factor , 2001, Bioinform..

[2]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[3]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[4]  Alberto Riva,et al.  MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes , 2005, BMC Bioinformatics.

[5]  M. A. El Hassan,et al.  Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. , 1996, Journal of molecular biology.

[6]  N. Sugimoto,et al.  Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. , 1996, Nucleic acids research.

[7]  M. Riley,et al.  The role of SOS boxes in enteric bacteriocin regulation. , 2008, Microbiology.

[8]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[9]  S. Haney,et al.  Lrp, a leucine-responsive protein, regulates branched-chain amino acid transport genes in Escherichia coli , 1992, Journal of bacteriology.

[10]  Christina Backes,et al.  Computation of significance scores of unweighted Gene Set Enrichment Analyses , 2007, BMC Bioinformatics.

[11]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[12]  D. Swigon,et al.  Catabolite activator protein: DNA binding and transcription activation. , 2004, Current opinion in structural biology.

[13]  G. Christian Overton,et al.  Conformational and physicochemical DNA features specific for transcription factor binding sites , 1999, Bioinform..

[14]  P. Nygaard,et al.  Genetic evidence for a repressor of synthesis of cytosine deaminase and purine biosynthesis enzymes in Escherichia coli , 1989, Journal of bacteriology.

[15]  A V Sivolob,et al.  Translational positioning of nucleosomes on DNA: the role of sequence-dependent isotropic DNA bending stiffness. , 1995, Journal of molecular biology.

[16]  G M Rubin,et al.  Insertion site preferences of the P transposable element in Drosophila melanogaster. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Mostafa Kaveh,et al.  Reconstruction of Escherichia coli transcriptional regulatory networks via regulon-based associations , 2009, BMC Systems Biology.

[19]  A. T. Vasconcelos,et al.  Identification of yebG as a DNA damage-inducible Escherichia coli gene. , 2006, FEMS microbiology letters.

[20]  K. Hantke Regulation of ferric iron transport in Escherichia coli K12: Isolation of a constitutive mutant , 2004, Molecular and General Genetics MGG.

[21]  Rolf Backofen,et al.  A multiple-feature framework for modelling and predicting transcription factor binding sites , 2005, Bioinform..

[22]  R. Ornstein,et al.  An optimized potential function for the calculation of nucleic acid interaction energies I. Base stacking , 1978, Biopolymers.

[23]  J. Shapiro,et al.  Differential fiu–lacZ fusion regulation linked to Escherichia coli colony development , 1999, Molecular microbiology.

[24]  D. Touati,et al.  Direct inhibition by nitric oxide of the transcriptional ferric uptake regulation protein via nitrosylation of the iron , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Kathleen Marchal,et al.  Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks , 2009, BMC Systems Biology.

[26]  F. Moreno,et al.  sbmC, a stationary‐phase induced SOS Escherichia coli gene, whose product protects cells from the DNA replication inhibitor microcin B17 , 1995, Molecular microbiology.

[27]  G. Church,et al.  A motif co-occurrence approach for genome-wide prediction of transcription-factor-binding sites in Escherichia coli. , 2004, Genome research.

[28]  Zhaolei Zhang,et al.  Accounting for Structural Properties and Nucleotide Co-variations in the Quantitative Prediction of Binding Affinities of Protein-DNA Interactions , 2005, Pacific Symposium on Biocomputing.

[29]  A A Mironov,et al.  Comparative analysis of FUR regulons in gamma-proteobacteria. , 2001, Nucleic acids research.

[30]  Byung-Kwan Cho,et al.  Transcriptional regulation of the fad regulon genes of Escherichia coli by ArcA. , 2006, Microbiology.

[31]  Yvan Saeys,et al.  Large-scale structural analysis of the core promoter in mammalian and plant genomes , 2005, Nucleic acids research.

[32]  J. Coggins,et al.  The serC-aro A operon of Escherichia coli. A mixed function operon encoding enzymes from two different amino acid biosynthetic pathways. , 1986, The Biochemical journal.

[33]  Ivanov Vi,et al.  [The A-form of DNA: in search of the biological role]. , 1994 .

[34]  Frederick R. Blattner,et al.  Genome-Wide Expression Analysis Indicates that FNR of Escherichia coli K-12 Regulates a Large Number of Genes of Unknown Function , 2005, Journal of bacteriology.

[35]  Kelly M. Thayer,et al.  Hidden Markov models from molecular dynamics simulations on DNA , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Helen M. Berman,et al.  Structure of the CAP-DNA complex at 2.5 angstroms resolution: a complete picture of the protein-DNA interface. , 1997, Journal of molecular biology.

[37]  J. Collado-Vides,et al.  Method DISTILLER : a data integration framework to reveal condition dependency of complex regulons in Escherichia coli , 2009 .

[38]  Pierre Baldi,et al.  Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths , 2000, Bioinform..

[39]  Holger Karas,et al.  Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements , 1996, Comput. Appl. Biosci..

[40]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[41]  T. Ogi,et al.  Binding and transcriptional activation of non-flagellar genes by the Escherichia coli flagellar master regulator FlhD2C2. , 2005, Microbiology.

[42]  A. Rich,et al.  A computer aided thermodynamic approach for predicting the formation of Z‐DNA in naturally occurring sequences. , 1986, The EMBO journal.

[43]  G. A. Grant,et al.  The nucleotide sequence of the serA gene of Escherichia coli and the amino acid sequence of the encoded protein, D-3-phosphoglycerate dehydrogenase. , 1986, The Journal of biological chemistry.

[44]  Fu Lu,et al.  The structure of PurR mutant L54M shows an alternative route to DNA kinking , 1998, Nature Structural Biology.

[45]  Pierre Baldi,et al.  Computational Applications of DNA Structural Scales , 1998, ISMB.

[46]  Lesley Griffiths,et al.  A Reassessment of the FNR Regulon and Transcriptomic Analysis of the Effects of Nitrate, Nitrite, NarXL, and NarQP as Escherichia coli K12 Adapts from Aerobic to Anaerobic Growth* , 2006, Journal of Biological Chemistry.

[47]  D. Goodsell,et al.  Bending and curvature calculations in B-DNA. , 1994, Nucleic acids research.

[48]  D. Rau,et al.  The flexibility of A-form DNA. , 1991, Journal of biomolecular structure & dynamics.

[49]  Satoshi Fujii,et al.  Sequence-dependent DNA deformability studied using molecular dynamics simulations , 2007, Nucleic acids research.

[50]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  R. Woodgate,et al.  Identification of additional genes belonging to the LexA regulon in Escherichia coli , 2000, Molecular microbiology.

[52]  R. Blake,et al.  Stacking energies in DNA. , 1991, The Journal of biological chemistry.

[53]  Chris E Cooper,et al.  Global Iron-dependent Gene Regulation in Escherichia coli , 2003, Journal of Biological Chemistry.

[54]  V. Zhurkin,et al.  B-DNA twisting correlates with base-pair morphology. , 1995, Journal of molecular biology.

[55]  S Brunak,et al.  A DNA structural atlas for Escherichia coli. , 2000, Journal of molecular biology.

[56]  Nick J Spencer,et al.  Overproduction, purification and preliminary X-ray diffraction analysis of YncE, an iron-regulated Sec-dependent periplasmic protein from Escherichia coli. , 2008, Acta crystallographica. Section F, Structural biology and crystallization communications.

[57]  D. Kozyrev,et al.  A method for direct cloning of fur-regulated genes: identification of seven new fur-regulated loci in Escherichia coli. , 2000, Microbiology.

[58]  Rolf Backofen,et al.  BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data , 2007, Nucleic Acids Res..

[59]  Peter D. Karp,et al.  EcoCyc: A comprehensive view of Escherichia coli biology , 2008, Nucleic Acids Res..

[60]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[61]  R. Harvey,et al.  Regulation of synthesis of serine hydroxymethyltransferase in chemostat cultures of Escherichia coli. , 1984, The Journal of biological chemistry.

[62]  K. Hantke,et al.  Dual Repression by Fe2+-Fur and Mn2+-MntR of the mntH Gene, Encoding an NRAMP-Like Mn2+ Transporter in Escherichia coli , 2001, Journal of bacteriology.

[63]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[64]  I. Brukner,et al.  Sequence‐dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. , 1995, The EMBO journal.

[65]  K. Shimizu,et al.  Investigation into the effect of soxR and soxS genes deletion on the central metabolism of Escherichia coli based on gene expressions and enzyme activities , 2006 .

[66]  Samuel Selvaraj,et al.  Role of inter and intramolecular interactions in protein-DNA recognition. , 2005, Gene.

[67]  B Demple,et al.  Positive control of a global antioxidant defense regulon activated by superoxide-generating agents in Escherichia coli. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[69]  D. Mount,et al.  Identification of high affinity binding sites for LexA which define new DNA damage-inducible genes in Escherichia coli. , 1994, Journal of molecular biology.

[70]  Julio Collado-Vides,et al.  Prediction of TF target sites based on atomistic models of protein-DNA complexes , 2008, BMC Bioinformatics.