Large-scale analysis of transcriptional cis-regulatory modules reveals both common features and distinct subclasses

BackgroundTranscriptional cis-regulatory modules (for example, enhancers) play a critical role in regulating gene expression. While many individual regulatory elements have been characterized, they have never been analyzed as a class.ResultsWe have performed the first such large-scale study of cis-regulatory modules in order to determine whether they have common properties that might aid in their identification and contribute to our understanding of the mechanisms by which they function. A total of 280 individual, experimentally verified cis-regulatory modules from Drosophila were analyzed for a range of sequence-level and functional properties. We report here that regulatory modules do indeed share common properties, among them an elevated GC content, an increased level of interspecific sequence conservation, and a tendency to be transcribed into RNA. However, we find that dense clustering of transcription factor binding sites, especially homotypic clustering, which is commonly believed to be a general characteristic of regulatory modules, is rather a feature that belongs chiefly to a specific subclass. This has important implications for current computational approaches, many of which are biased toward this subset. We explore two new strategies to assess binding site clustering and gauge their performances with respect to their ability to detect all 280 modules and various functionally coherent subsets.ConclusionOur findings demonstrate that cis-regulatory modules share common features that help to define them as a class and that may lead to new insights into mechanisms of gene regulation. However, these properties alone may not be sufficient to reliably distinguish regulatory from non-regulatory sequences. We also demonstrate that there are distinct subclasses of cis-regulatory modules that are more amenable to in silico detection than others and that these differences must be taken into account when attempting genome-wide regulatory element discovery.

[1]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[2]  M. Bulyk Computational prediction of transcription-factor binding site locations , 2003, Genome Biology.

[3]  S. Salzberg,et al.  Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura , 2004, Genome Biology.

[4]  Eric D Siggia,et al.  Computational methods for transcriptional regulation. , 2005, Current opinion in genetics & development.

[5]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[6]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Marc S. Halfon,et al.  Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura , 2004, Bioinform..

[8]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[9]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[10]  M. Halfon (Re)modeling the transcriptional enhancer , 2006, Nature Genetics.

[11]  Dennis F. Kibler,et al.  Using hexamers to predict cis-regulatory motifs in Drosophila , 2005, BMC Bioinformatics.

[12]  D. Halligan,et al.  Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. , 2006, Genome research.

[13]  E. Schadt,et al.  Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. , 2005, Trends in genetics : TIG.

[14]  M. Levine,et al.  Dpp signaling thresholds in the dorsal ectoderm of the Drosophila embryo. , 2000, Development.

[15]  N. Bresolin,et al.  Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. , 2005, Human molecular genetics.

[16]  G. Wray The evolutionary significance of cis-regulatory mutations , 2007, Nature Reviews Genetics.

[17]  S. Mango,et al.  Whole-Genome Analysis of Temporal Gene Expression during Foregut Development , 2004, PLoS biology.

[18]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[19]  E. Lewis,et al.  Characterization of the intergenic RNA profile at abdominal-A and Abdominal-B in the Drosophila bithorax complex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  G. Helt,et al.  Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.

[21]  Manuel Calleja,et al.  Two distinct mechanisms for long-range patterning by Decapentaplegic in the Drosophila wing , 1996, Nature.

[22]  Marc S Halfon,et al.  Exploring genetic regulatory networks in metazoan development: methods and models. , 2002, Physiological genomics.

[23]  S. Liebhaber,et al.  Locus control region transcription plays an active role in long-range gene activation. , 2006, Molecular cell.

[24]  David Sturgill,et al.  Comparative genomics of Drosophila and human core promoters , 2006, Genome Biology.

[25]  M. Laubichler Review of: Carroll, Sean B., Jennifer K. Grenier and Scott D. Weatherbee: From DNA to diversity : molecular genetics and the evolution of animal design. Malden, Mass [u.a.]: Blackwell Science 2001 , 2003 .

[26]  Inna Dubchak,et al.  Conservation patterns in different functional sequence categories of divergent Drosophila species. , 2005, Genomics.

[27]  O. Hobert,et al.  Genomic cis-regulatory architecture and trans-acting regulators of a single interneuron-specific gene battery in C. elegans. , 2004, Developmental cell.

[28]  Jeffrey R. Powell,et al.  Progress and Prospects in Evolutionary Biology: The Drosophila Model , 1997 .

[29]  E. Davidson The Regulatory Genome: Gene Regulatory Networks In Development And Evolution , 2006 .

[30]  Walter R. Gilks,et al.  Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test , 2004, BMC Bioinformatics.

[31]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[32]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[33]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[34]  Brian Charlesworth,et al.  Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content , 2005, Genome Biology.

[35]  Anthony A. Philippakis,et al.  Expression-Guided In Silico Evaluation of Candidate Cis Regulatory Codes for Drosophila Muscle Founder Cells , 2006, PLoS Comput. Biol..

[36]  S. Carroll,et al.  From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design , 2000 .

[37]  Wolfgang Driever,et al.  Determination of spatial domains of zygotic gene expression in the Drosophila embryo by the affinity of binding sites for the bicoid morphogen , 1989, Nature.

[38]  Kirby D. Johnson,et al.  Highly Restricted Localization of RNA Polymerase II within a Locus Control Region of a Tissue-Specific Chromatin Domain , 2003, Molecular and Cellular Biology.

[39]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[40]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[41]  J. Rinn,et al.  The transcriptional activity of human Chromosome 22. , 2003, Genes & development.

[42]  Ferenc Müller,et al.  The identification and functional characterisation of conserved regulatory elements in developmental genes. , 2005, Briefings in functional genomics & proteomics.

[43]  Wei Wang,et al.  Dissecting the transcription networks of a cell using computational genomics. , 2003, Current opinion in genetics & development.

[44]  J. Costas,et al.  Turnover of binding sites for transcription factors involved in early Drosophila development. , 2003, Gene.

[45]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Inna Dubchak,et al.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. , 2005, Genome research.

[47]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[48]  K. Struhl,et al.  The gradient morphogen bicoid is a concentration-dependent transcriptional activator , 1989, Cell.

[49]  S. Carroll,et al.  Molecular mechanisms of selector gene function and evolution. , 2002, Current opinion in genetics & development.

[50]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[51]  Robert A. Drewell,et al.  Transcription defines the embryonic domains of cis-regulatory activity at the Drosophila bithorax complex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[52]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[53]  R. Paro,et al.  Intergenic transcription through a polycomb group response element counteracts silencing. , 2005, Genes & development.

[54]  A. Michelson,et al.  Biological code breaking in the 21st century , 2006, Molecular systems biology.

[55]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[56]  Dmitri Papatsenko,et al.  The role of binding site cluster strength in Bicoid-dependent patterning in Drosophila. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Scott A. Rifkin,et al.  A Gene Expression Map for the Euchromatic Genome of Drosophila melanogaster , 2004, Science.

[58]  Joseph M. Dale,et al.  Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome , 2003, Science.

[59]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[60]  A. Clark,et al.  Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. , 2003, Molecular biology and evolution.

[61]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[62]  Martin Klingler,et al.  Structure and evolution of a pair-rule interaction element: runt regulatory sequences in D. melanogaster and D. virilis , 1999, Mechanisms of Development.

[63]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[64]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[65]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[66]  H. Jäckle,et al.  From gradients to stripes in Drosophila embryogenesis: filling in the gaps. , 1996, Trends in genetics : TIG.

[67]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[68]  S. Carroll,et al.  The regulatory content of intergenic DNA shapes genome architecture , 2004, Genome Biology.

[69]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[70]  W. McGinnis,et al.  From DNA to Diversity, Molecular Genetics and the Evolution of Animal Design, 2nd edition , 2005 .

[71]  D. Petrov,et al.  High intrinsic rate of DNA loss in Drosophila , 1996, Nature.

[72]  M. Levine,et al.  Immunity regulatory DNAs share common organizational features in Drosophila. , 2004, Molecular cell.

[73]  M Klingler,et al.  Disperse versus compact elements for the regulation of runt stripes in Drosophila. , 1996, Developmental biology.

[74]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[75]  James Briscoe,et al.  The interpretation of morphogen gradients , 2006, Development.

[76]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[77]  Long Li,et al.  REDfly: a Regulatory Element Database for Drosophila , 2006, Bioinform..

[78]  H. Ashe,et al.  Intergenic transcription and transinduction of the human beta-globin locus. , 1997, Genes & development.

[79]  Mark Rebeiz,et al.  SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Scott Barolo,et al.  Lateral inhibition in proneural clusters: cis-regulatory logic and default repression by Suppressor of Hairless , 2005, Development.

[81]  M. Antoniou,et al.  Analysis of intergenic transcription in the human IL-4/IL-13 gene cluster. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[82]  C. Plessy,et al.  Enhancer sequence conservation between vertebrates is favoured in developmental regulator genes. , 2005, Trends in genetics : TIG.

[83]  W. Reith,et al.  Chromatin remodeling and extragenic transcription at the MHC class II locus control region , 2003, Nature Immunology.

[84]  Shyam Prabhakar,et al.  Close sequence comparisons are sufficient to identify human cis-regulatory elements. , 2005, Genome research.

[85]  Michael Levine,et al.  Coordinate enhancers share common organizational features in the Drosophila genome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[86]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[87]  T. Hughes,et al.  A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription , 2005, BMC Genomics.

[88]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[89]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[90]  Srinka Ghosh,et al.  Biological function of unannotated transcription during the early development of Drosophila melanogaster , 2006, Nature Genetics.

[91]  A. Gnirke,et al.  Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome , 2002, Genome Biology.

[92]  D. Hartl,et al.  Codon usage bias and base composition of nuclear genes in Drosophila. , 1993, Genetics.

[93]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[94]  Eldon Emberly,et al.  Conservation of regulatory elements between two species of Drosophila , 2003, BMC Bioinformatics.

[95]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[96]  Jian Wang,et al.  Detecting novel low-abundant transcripts in Drosophila. , 2005, RNA.

[97]  Casey M. Bergman,et al.  Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster , 2005, Bioinform..