Predicting Transcription Factor Binding Sites Using Structural Knowledge

Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid-nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We apply our approach to the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with various experimental results. To demonstrate the potential of our algorithm, we use the learned preferences to predict binding site models for novel proteins from the same family. These models are then used in genomic scans to find putative binding sites of the novel proteins.

[1]  A Klug,et al.  Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael R. Green,et al.  Expressing the human genome , 2001, Nature.

[3]  M Gerstein,et al.  Stereochemical basis of DNA recognition by Zn fingers. , 1994, Nucleic acids research.

[4]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[5]  C. Pabo,et al.  High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. , 1998, Structure.

[6]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[7]  T. D. Schneider,et al.  Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX. , 1999, Nucleic acids research.

[8]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[10]  G. Church,et al.  A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. , 1998, Journal of molecular biology.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  T. Steitz,et al.  Sequence-specific recognition of DNA by zinc-finger peptides derived from the transcription factor Sp1. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[14]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[15]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[16]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Berg,et al.  Sp1 and the subfamily of zinc finger proteins with guanine-rich binding sites. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  C. Pabo,et al.  Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code. , 1999, Journal of molecular biology.

[19]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[20]  Richard H. Lathrop,et al.  DNA sequence and structure: direct and indirect recognition in protein-DNA binding , 2002, ISMB.

[21]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[22]  N. Pavletich,et al.  Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A , 1991, Science.

[23]  Hanah Margalit,et al.  A Structure-Based Approach for Prediction of Protein Binding Sites in Gene-Upstream Regions , 2000, Pacific Symposium on Biocomputing.

[24]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[25]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[26]  N. Friedman,et al.  CIS: compound importance sampling method for protein-DNA binding site p-value estimation , 2005, Bioinform..

[27]  H. Margalit,et al.  Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. , 1995, Journal of molecular biology.

[28]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.