Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors

MOTIVATION Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C(2)H(2) zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. RESULTS We present a context-dependent model for DNA-zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C(2)H(2) zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA-zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA-zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. AVAILABILITY The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html

[1]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[2]  C. Pabo,et al.  DNA recognition by Cys2His2 zinc finger proteins. , 2000, Annual review of biophysics and biomolecular structure.

[3]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[4]  N. Seeman,et al.  Sequence-specific Recognition of Double Helical Nucleic Acids by Proteins (base Pairs/hydrogen Bonding/recognition Fidelity/ion Binding) , 2022 .

[5]  A Klug,et al.  Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Anirvan M. Sengupta,et al.  Non-additivity in protein-DNA binding , 2005, Bioinform..

[8]  A Klug,et al.  Physical basis of a protein-DNA recognition code. , 1997, Current opinion in structural biology.

[9]  G. Tell,et al.  A molecular code dictates sequence‐specific DNA recognition by homeodomains. , 1996, The EMBO journal.

[10]  Gary D. Stormo,et al.  enoLOGOS: a versatile web tool for energy normalized sequence logos , 2005, Nucleic Acids Res..

[11]  Shaun Mahony,et al.  Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genome , 2007, Genome biology.

[12]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[13]  Nir Friedman,et al.  Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge , 2005, PLoS Comput. Biol..

[14]  N. Pavletich,et al.  Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A , 1991, Science.

[15]  P. Bucher,et al.  High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites , 2002, Nature Biotechnology.

[16]  D J Segal,et al.  Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Edward J. Oakeley,et al.  Position dependencies in transcription factor binding sites , 2007, Bioinform..

[18]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[19]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[20]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[21]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[22]  Brian W. Matthews,et al.  No code for recognition , 1988, Nature.

[23]  C. Pabo,et al.  Beyond the "recognition code": structures of two Cys2His2 zinc finger/TATA box complexes. , 2001, Structure.

[24]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  John W. R. Schwabe,et al.  The crystal structure of a two zinc-finger peptide reveals an extension to the rules for zinc-finger/DNA recognition , 1993, Nature.

[26]  M. Brodsky,et al.  A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors , 2005, Nature Biotechnology.

[27]  Bernard Widrow,et al.  The basic ideas in neural networks , 1994, CACM.

[28]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[29]  David N. Messina,et al.  An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. , 2004, Genome research.

[30]  Eric Mjolsness,et al.  Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural Networks , 2006, PLoS Comput. Biol..

[31]  C. Pabo,et al.  High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. , 1998, Structure.

[32]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[33]  M Suzuki,et al.  DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[34]  R. Sauer,et al.  Protein-DNA recognition. , 1984, Annual review of biochemistry.

[35]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[36]  F Borel,et al.  Comparison of the DNA binding characteristics of the related zinc finger proteins WT1 and EGR1. , 1998, Biochemistry.

[37]  G. Stormo,et al.  Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions , 2005, Nucleic acids research.

[38]  N M Luscombe,et al.  New tools and resources for analysing protein structures and their interactions. , 1998, Acta crystallographica. Section D, Biological crystallography.

[39]  Gary D. Stormo,et al.  Quantitative analysis of EGR proteins binding to DNA: assessing additivity in both the binding site and the protein , 2005, BMC Bioinformatics.

[40]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[41]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[42]  J. Thornton,et al.  NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. , 1997, Nucleic acids research.

[43]  C. Pabo,et al.  Zif268 protein-DNA complex refined at 1.6 A: a model system for understanding zinc finger-DNA interactions. , 1996, Structure.

[44]  Samuel Selvaraj,et al.  Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. , 2004, Journal of molecular biology.

[45]  C. Pabo,et al.  Binding Studies with Mutants of Zif268 , 1999, The Journal of Biological Chemistry.

[46]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[47]  C. Pabo,et al.  Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc finger-DNA recognition. , 2001, Journal of molecular biology.

[48]  B. Matthews,et al.  How Cro and lambda-repressor distinguish between operators: the structural basis underlying a genetic switch. , 1998, Proceedings of the National Academy of Sciences of the United States of America.