DNA sequence and structure: direct and indirect recognition in protein-DNA binding

MOTIVATION Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? RESULTS The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.

[1]  Hanah Margalit,et al.  A Structure-Based Approach for Prediction of Protein Binding Sites in Gene-Upstream Regions , 2000, Pacific Symposium on Biocomputing.

[2]  S Brunak,et al.  Genome organisation and chromatin structure in Escherichia coli. , 2001, Biochimie.

[3]  Pierre Baldi,et al.  DNA Structure, Protein-DNA Interactions, and DNA-Protein Expression - Session Introduction , 2001, Pacific Symposium on Biocomputing.

[4]  Pierre Baldi,et al.  Computational Applications of DNA Structural Scales , 1998, ISMB.

[5]  W. Olson,et al.  A-form conformational motifs in ligand-bound DNA structures. , 2000, Journal of molecular biology.

[6]  T. D. Schneider,et al.  Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation. , 2001, Nucleic acids research.

[7]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[8]  T. Kunkel,et al.  Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: alteration of DNA binding specificity through alteration of DNA kinking. , 2001, Journal of molecular biology.

[9]  G. W. Hatfield,et al.  DNA topology-mediated control of global gene expression in Escherichia coli. , 2002, Annual review of genetics.

[10]  Gary D. Stormo,et al.  SAMIE: Statistical Algorithm for Modeling Interaction Energies , 2000, Pacific Symposium on Biocomputing.

[11]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[12]  Esko Ukkonen,et al.  Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data , 2000, ISMB.

[13]  Phoebe A Rice,et al.  Crystal Structure of an IHF-DNA Complex: A Protein-Induced DNA U-Turn , 1996, Cell.

[14]  David J. States,et al.  Conformational model for binding site recognition by the E.coli MetJ transcription factor , 2001, Bioinform..

[15]  D. Lilley Understanding DNA: The molecule and how it works , 1993 .

[16]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[17]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[18]  J. Geiselmann,et al.  In vivo interaction of the Escherichia coli integration host factor with its specific binding sites. , 1995, Nucleic acids research.

[19]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  W. McClure,et al.  Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). , 1990, Nucleic acids research.

[21]  M. Michael Gromiha,et al.  Target Prediction of Transcription Factors: Refinement of Structure-Based Method , 2001 .

[22]  C. Calladine,et al.  Understanding DNA: The Molecule & How It Works , 1992 .

[23]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[24]  J Geiselmann,et al.  In vivo interaction of the Escherichia coli integration host factor with its specific binding sites. , 1995, Nucleic acids research.

[25]  P. Rice,et al.  Making DNA do a U-turn: IHF and related proteins. , 1997, Current opinion in structural biology.