Characterization of Binding Sites of Eukaryotic Transcription Factors

To explore the nature of eukaryotic transcription factor (TF) binding sites and determine how they differ from surrounding DNA sequences, we examined four features associated with DNA binding sites: G+C content, pattern complexity, palindromic structure, and Markov sequence ordering. Our analysis of the regulatory motifs obtained from the TRANSFAC database, using yeast intergenic sequences as background, revealed that these four features show variable enrichment in motif sequences. For example, motif sequences were more likely to have palindromic structure than were background sequences. In addition, these features were tightly localized to the regulatory motifs, indicating that they are a property of the motif sequences themselves and are not shared by the general promoter “environment” in which the regulatory motifs reside. By breaking down the motif sequences according to the TF classes to which they bind, more specific associations were identified. Finally, we found that some correlations, such as G+C content enrichment, were species-specific, while others, such as complexity enrichment, were universal across the species examined. The quantitative analysis provided here should increase our understanding of protein-DNA interactions and also help facilitate the discovery of regulatory motifs through bioinformatics.

[1]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[2]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[3]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[4]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[5]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[6]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[7]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[8]  Panayiotis V Benos,et al.  Is there a code for protein-DNA recognition? Probab(ilistical)ly. . . , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[9]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[10]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[11]  D. Lockhart,et al.  Mitotic misregulation and human aging. , 2000, Science.

[12]  N. D. Clarke,et al.  DIP-chip: rapid and accurate determination of DNA-binding specificity. , 2005, Genome research.

[13]  D. Arnosti Analysis and function of transcriptional regulatory elements: insights from Drosophila. , 2003, Annual review of entomology.

[14]  M. Gerstein,et al.  Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. , 2002, Genes & development.

[15]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[16]  A. Bird,et al.  Number of CpG islands and genes in human and mouse. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[17]  C. Sensen,et al.  Complete DNA sequence of yeast chromosome XI , 1994, Nature.

[18]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[19]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[20]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[21]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[22]  M. Sioud Therapeutic siRNAs. , 2004, Trends in pharmacological sciences.

[23]  Marc S Halfon,et al.  Exploring genetic regulatory networks in metazoan development: methods and models. , 2002, Physiological genomics.

[24]  Yoshiyuki Sakaki,et al.  A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. , 2004, Genome research.

[25]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[26]  B. Vallee,et al.  Zinc fingers, zinc clusters, and zinc twists in DNA-binding protein domains. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M Suzuki,et al.  A framework for the DNA-protein recognition code of the probe helix in transcription factors: the chemical and stereochemical rules. , 1994, Structure.

[28]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[29]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[30]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[31]  R. Tupler,et al.  Profound misregulation of muscle-specific gene expression in facioscapulohumeral muscular dystrophy. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[33]  B. Alberts,et al.  Molecular Biology of the Cell, Third Edition , 1994 .

[34]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.