Prediction of TF target sites based on atomistic models of protein-DNA complexes

BackgroundThe specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.ResultsHere we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models.ConclusionOur results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.

[1]  Olivier Lichtarge,et al.  Correlated evolutionary pressure at interacting transcription factors and DNA response elements can guide the rational engineering of DNA binding specificity. , 2005, Journal of molecular biology.

[2]  H. Margalit,et al.  A role for CH...O interactions in protein-DNA recognition. , 1998, Journal of molecular biology.

[3]  W. Olson,et al.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. , 2003, Nucleic acids research.

[4]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[5]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[6]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[7]  S. Selvaraj,et al.  Specificity of protein-DNA recognition revealed by structure-based potentials: symmetric/asymmetric and cognate/non-cognate binding. , 2002, Journal of molecular biology.

[8]  Michael R. Sawaya,et al.  Dimerization allows DNA target site recognition by the NarL response regulator , 2002, Nature Structural Biology.

[9]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[10]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[11]  M. Sundaralingam,et al.  C-H...O hydrogen bonding in biology. , 1997, Trends in biochemical sciences.

[12]  Gabriele Varani,et al.  An all‐atom, distance‐dependent scoring function for the prediction of protein–DNA interactions from structure , 2006, Proteins.

[13]  R. Kaptein,et al.  Structure and Flexibility Adaptation in Nonspecific and Specific Protein-DNA Complexes , 2004, Science.

[14]  Denis Thieffry,et al.  Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12 , 1998, Bioinform..

[15]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[16]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[17]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[18]  A. Sarai,et al.  Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Tarun Jain,et al.  The role of water in protein-DNA recognition. , 2004, Annual review of biophysics and biomolecular structure.

[20]  T. Steitz,et al.  Crystal structure of a CAP-DNA complex: the DNA is bent by 90 degrees , 1991, Science.

[21]  Jun-tao Guo,et al.  Quantitative evaluation of protein–DNA interactions using an optimized knowledge-based potential , 2005, Nucleic acids research.

[22]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[23]  L. Mirny,et al.  Structural analysis of conserved base pairs in protein-DNA complexes. , 2002, Nucleic acids research.

[24]  D. Arnosti,et al.  Role of CtBP in transcriptional repression by the Drosophila giant protein. , 2001, Developmental biology.

[25]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[26]  H. Kono,et al.  Protein-DNA recognition patterns and predictions. , 2005, Annual review of biophysics and biomolecular structure.

[27]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[28]  Julio Collado-Vides,et al.  TFmodeller: comparative modelling of protein-DNA complexes , 2007, Bioinform..

[29]  Guillaume Paillard,et al.  Analyzing protein-DNA recognition mechanisms. , 2004, Structure.

[30]  M Suzuki,et al.  DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[31]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[33]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[34]  Nir Friedman,et al.  Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge , 2005, PLoS Comput. Biol..

[35]  Hanah Margalit,et al.  A Structure-Based Approach for Prediction of Protein Binding Sites in Gene-Upstream Regions , 2000, Pacific Symposium on Biocomputing.

[36]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[37]  Julio Collado-Vides,et al.  The role of DNA-binding specificity in the evolution of bacterial regulatory networks. , 2008, Journal of molecular biology.

[38]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[39]  O. Nureki,et al.  Structural basis of replication origin recognition by the DnaA protein. , 2003, Nucleic acids research.

[40]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[41]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[42]  Akinori Sarai,et al.  Integration of Bioinformatics and Computational Biology to Understand Protein-dna Recognition Mechanism , 2005, J. Bioinform. Comput. Biol..

[43]  M. Schumacher,et al.  Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. , 1994, Science.

[44]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[45]  Antonina Silkov,et al.  Structural alignment of protein--DNA interfaces: insights into the determinants of binding specificity. , 2005, Journal of molecular biology.

[46]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Jason E. Donald,et al.  Energetics of protein–DNA interactions , 2006, Nucleic acids research.

[48]  C. Pabo,et al.  Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc finger-DNA recognition. , 2001, Journal of molecular biology.

[49]  E. Siggia,et al.  Connecting protein structure with predictions of regulatory sites , 2007, Proceedings of the National Academy of Sciences.

[50]  Julio Collado-Vides,et al.  Comparative footprinting of DNA-binding proteins , 2006, ISMB.

[51]  Samuel Selvaraj,et al.  Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. , 2004, Journal of molecular biology.

[52]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[53]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[54]  H. Margalit,et al.  Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. , 1995, Journal of molecular biology.