Positional distribution of human transcription factor binding sites

We developed a method for estimating the positional distribution of transcription factor (TF) binding sites using ChIP-chip data, and applied it to recently published experiments on binding sites of nine TFs: OCT4, SOX2, NANOG, HNF1A, HNF4A, HNF6, FOXA2, USF1 and CREB1. The data were obtained from a genome-wide coverage of promoter regions from 8-kb upstream of the transcription start site (TSS) to 2-kb downstream. The number of target genes of each TF ranges from few hundred to several thousand. We found that for each of the nine TFs the estimated binding site distribution is closely approximated by a mixture of two components: a narrow peak, localized within 300-bp upstream of the TSS, and a distribution of almost uniform density within the tested region. Using Gene Ontology (GO) and Enrichment analysis, we were able to associate (for each of the TFs studied) the target genes of both types of binding with known biological processes. Most GO terms were enriched either among the proximal targets or among those with a uniform distribution of binding sites. For example, the three stemness-related TFs have several hundred target genes that belong to ‘development’ and ‘morphogenesis’ whose binding sites belong to the uniform distribution.

[1]  Roded Sharan,et al.  CREME: Cis-Regulatory Module Explorer for the human genome , 2004, Nucleic Acids Res..

[2]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[3]  Anirvan M. Sengupta,et al.  Specificity and robustness in transcription control networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[5]  Tie Koide,et al.  HTself: self-self based statistical test for low replication microarray studies. , 2005, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  H. Lodish Molecular Cell Biology , 1986 .

[7]  Nicola J. Rinaldi,et al.  Control of Pancreas and Liver Gene Expression by HNF Transcription Factors , 2004, Science.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Carsten Peterson,et al.  Transcriptional Dynamics of the Embryonic Stem Cell Switch , 2006, PLoS Comput. Biol..

[10]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[11]  M. Murakami,et al.  The Homeoprotein Nanog Is Required for Maintenance of Pluripotency in Mouse Epiblast and ES Cells , 2003, Cell.

[12]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[13]  K. Kaestner,et al.  The Hepatocyte Nuclear Factor 3 (HNF3 or FOXA) Family in Metabolism , 2000, Trends in Endocrinology & Metabolism.

[14]  Xi Chen,et al.  Reciprocal Transcriptional Regulation of Pou5f1 and Sox2 via the Oct4/Sox2 Complex in Embryonic Stem Cells , 2005, Molecular and Cellular Biology.

[15]  T. Huang,et al.  Tilling the chromatin landscape: emerging methods for the discovery and profiling of protein-DNA interactions. , 2005, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[16]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[17]  P. Robson,et al.  Transcriptional Regulation of Nanog by OCT4 and SOX2* , 2005, Journal of Biological Chemistry.

[18]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[19]  Ernest Fraenkel,et al.  High-resolution computational models of genome binding events , 2006, Nature Biotechnology.

[20]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[21]  Sue Povey,et al.  Genew: the Human Gene Nomenclature Database, 2004 updates , 2004, Nucleic Acids Res..

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[24]  Eytan Domany,et al.  Finding Motifs in Promoter Regions , 2005, J. Comput. Biol..

[25]  Ernest Fraenkel,et al.  Core transcriptional regulatory circuitry in human hepatocytes , 2006, Molecular systems biology.

[26]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[27]  Eytan Domany,et al.  Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site , 2007, PloS one.

[28]  B. Birren,et al.  Campomelic dysplasia translocation breakpoints are scattered over 1 Mb proximal to SOX9: evidence for an extended control region. , 1999, American journal of human genetics.

[29]  S. Aizawa,et al.  Characterization of the pufferfish Otx2 cis-regulators reveals evolutionarily conserved genetic mechanisms for vertebrate head specification , 2004, Development.

[30]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[31]  Mark Gerstein,et al.  CREB Binds to Multiple Loci on Human Chromosome 22 , 2004, Molecular and Cellular Biology.

[32]  Christoph Dieterich,et al.  Ab initio identification of putative human transcription factor binding sites by comparative genomics , 2005, BMC Bioinformatics.

[33]  S. Corre,et al.  [USF as a key regulatory element of gene expression]. , 2006, Medecine sciences : M/S.

[34]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[35]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.