Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes.

Sequence information and high-throughput methods to measure gene expression levels open the door to explore transcriptional regulation using computational tools. Combinatorial regulation and sparseness of regulatory elements throughout the genome allow organisms to control the spatial and temporal patterns of gene expression. Here we study the organization of cis-regulatory elements in sets of co-regulated genes. We build an algorithm to search for combinations of transcription factor binding sites that are enriched in a set of potentially co-regulated genes with respect to the whole genome. No knowledge is assumed about involvement of specific sets of transcription factors. Instead, the search is exhaustively conducted over combinations of up to four binding sites obtained from databases or motif search algorithms. We evaluate the performance on random sets of genes as a negative control and on three biologically validated sets of co-regulated genes in yeasts, flies and humans. We show that we can detect DNA regions that play a role in the control of transcription. These results shed light on the structure of transcription regulatory regions in eukaryotes and can be directly applied to clusters of co-expressed genes obtained in gene expression studies. Supplementary information is available at http://www.mit.edu/ approximately kreiman/resources/cisregul/.

[1]  Wyeth W. Wasserman,et al.  Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm , 2003, ISMB.

[2]  Andreas Wagner,et al.  Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes , 1999, Bioinform..

[3]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[4]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[5]  S. McKnight,et al.  Eukaryotic transcriptional regulatory proteins. , 1989, Annual review of biochemistry.

[6]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[7]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[8]  Michael Levine,et al.  Decoding cis-regulatory DNAs in the Drosophila genome. , 2002, Current opinion in genetics & development.

[9]  E. S. Keeping,et al.  Introduction to statistical inference , 1958 .

[10]  Lukas Endler,et al.  Forkhead-like transcription factors recruit Ndd1 to the chromatin of G2/M-specific promoters , 2000, Nature.

[11]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[13]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[14]  J. Fickett Copyright � 1996, American Society for Microbiology Quantitative Discrimination of MEF2 Sites , 1995 .

[15]  R. Schleif,et al.  DNA looping. , 1988, Science.

[16]  J. T. Kadonaga,et al.  Going the distance: a current view of enhancer action. , 1998, Science.

[17]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[18]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[19]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[20]  R. Young,et al.  Transcription of eukaryotic protein-coding genes. , 2000, Annual review of genetics.

[21]  Z. Weng,et al.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. , 2002, Nucleic acids research.

[22]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[23]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[24]  Doron Lancet,et al.  GeneNote: whole genome expression profiles in normal human tissues. , 2003, Comptes rendus biologies.

[25]  B. Emerson Specificity of Gene Regulation , 2002, Cell.

[26]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[27]  T. Heinemeyer,et al.  TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation , 1997, Nucleic Acids Res..

[28]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[29]  Mark Gerstein,et al.  Distribution of NF-kappaB-binding sites across human chromosome 22. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[31]  L. Pachter,et al.  rVista for comparative sequence-based discovery of functional transcription factor binding sites. , 2002, Genome research.

[32]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M. Zirlinger,et al.  Amygdala-enriched genes identified by microarray technology are restricted to specific amygdaloid subnuclei , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[35]  William Stafford Noble,et al.  Searching for statistically significant regulatory modules , 2003, ECCB.

[36]  S. Adhya,et al.  Multipartite genetic control elements: communication by DNA loop. , 1989, Annual review of genetics.

[37]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Thomas E. Royce,et al.  Distribution of NF-κB-binding sites across human chromosome 22 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[40]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[42]  H. Prydz,et al.  CpG islands as gene markers in the human genome. , 1992, Genomics.

[43]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[44]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[45]  A. V. Grimstone Molecular biology of the cell (3rd edn) , 1995 .

[46]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[47]  J. Fickett Quantitative discrimination of MEF2 sites , 1996, Molecular and cellular biology.

[48]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[49]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[50]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[51]  Alexander E. Kel,et al.  COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation , 2000, Nucleic Acids Res..

[52]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[53]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[54]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[55]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[56]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[57]  S. Levy,et al.  Predicting transcription factor synergism. , 2002, Nucleic acids research.

[58]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[59]  L. Johnston,et al.  The forkhead protein Fkh2 is a component of the yeast cell cycle transcription factor SFF , 2000, The EMBO journal.