Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura

MOTIVATION To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genome-wide discovery of similarly acting enhancers, but requires prior knowledge of the set of TFs acting at the CRM and the TFs' binding motifs. RESULTS We propose a method for CRM discovery that combines aspects of both approaches in an effort to overcome their individual limitations. By treating phylogenetically footprinted non-coding regions (PFRs) as proxies for CRMs, we endeavor to find PFRs near co-regulated genes that are comprised of similar short, conserved sequences. Using Markov chains as a convenient formulation to assess similarity, we develop a sampling algorithm to search a large group of PFRs for the most similar subset. When starting with a set of genes involved in Drosophila early blastoderm development and using phylogenetic comparisons of Drosophila melanogaster and D.pseudoobscura genomes, we show here that our algorithm successfully detects known CRMs. Further, we use our similarity metric, based on Markov chain discrimination, in a genome-wide search, and uncover additional known and many candidate early blastoderm CRMs. AVAILABILITY Software is available via http://arep.med.harvard.edu/enhancer

[1]  N. Gostling,et al.  From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design , 2002, Heredity.

[2]  M. Nei,et al.  Molecular phylogeny and divergence times of drosophilid species. , 1995, Molecular biology and evolution.

[3]  W. McGinnis,et al.  The regulation of empty spiracles by Abdominal-B mediates an abdominal segment identity function. , 1993, Genes & development.

[4]  J. Fickett,et al.  Discovery and modeling of transcriptional regulatory regions. , 2000, Current opinion in biotechnology.

[5]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[6]  S. Small,et al.  Anterior repression of a Drosophila stripe enhancer requires three position-specific mechanisms. , 2002, Development.

[7]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[8]  Martin C. Frith,et al.  Detection of cis -element clusters in higher eukaryotic DNA , 2001, Bioinform..

[9]  Mark Rebeiz,et al.  SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  M. Noll,et al.  Separable regulatory elements mediate the establishment and maintenance of cell states by the Drosophila segment‐polarity gene gooseberry. , 1993, The EMBO journal.

[13]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[14]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[15]  A. Gnirke,et al.  Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome , 2002, Genome Biology.

[16]  E. Davidson Genomic Regulatory Systems: Development and Evolution , 2005 .

[17]  Martin Klingler,et al.  Structure and evolution of a pair-rule interaction element: runt regulatory sequences in D. melanogaster and D. virilis , 1999, Mechanisms of Development.

[18]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[19]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[20]  A. Clark,et al.  Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. , 2003, Molecular biology and evolution.

[21]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[22]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[25]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[26]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[27]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[28]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[29]  K. Roeder,et al.  A statistical model for locating regulatory regions in genomic DNA. , 1997, Journal of molecular biology.

[30]  M Meselson,et al.  Interspecific nucleotide sequence comparisons used to identify regulatory and structural features of the Drosophila hsp82 gene. , 1986, Journal of molecular biology.

[31]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[32]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[33]  Michael Ashburner,et al.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review , 2002, Genome Biology.

[34]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[35]  M. Goodman,et al.  Phylogenetic footprinting reveals unexpected complexity in trans factor binding upstream from the epsilon-globin gene. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[37]  M. Goodman,et al.  Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. , 1988, Journal of molecular biology.

[38]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[39]  Roderic Guigó,et al.  Gff2ps: Visualizing Genomic Annotations , 2000, Bioinform..

[40]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  E. Denamur,et al.  Cross-species characterization of the promoter region of the cystic fibrosis transmembrane conductance regulator gene reveals multiple levels of regulation. , 1997, The Biochemical journal.

[42]  J. St-Amand,et al.  Combinatorial activity of pair-rule proteins on the Drosophila gooseberry early enhancer. , 2000, Developmental biology.

[43]  Sridhar Hannenhalli,et al.  Enrichment of regulatory signals in conserved non-coding genomic sequence , 2001, Bioinform..

[44]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[45]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[46]  N. Patel,et al.  Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. , 1998, Development.

[47]  G. Church,et al.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. , 2000, Genome research.

[48]  Alan M. Moses,et al.  Position specific variation in the rate of evolution in transcription factor binding sites , 2003, BMC Evolutionary Biology.

[49]  M. Fujioka,et al.  The even-skipped locus is contained in a 16-kb chromatin domain. , 1999, Developmental biology.

[50]  William H. Press,et al.  Numerical recipes in C , 2002 .

[51]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.