Computational identification of regulatory DNAs underlying animal development

Whole-genome sequence assemblies provide a rich resource for the in silico identification and characterization of regulatory DNAs, particularly enhancers, in different animal groups, including Caenorhabditis elegans, Drosophila melanogaster and Mus musculus. There are two major methods for the recognition of regulatory DNAs within complex genome assemblies1: (i) clustering of combinations of sequence motifs that correspond to known binding sites for defined transcription factors and (ii) phylogenetic analyses that identify sequence conservation in noncoding regions among two or more related species. We describe here the first method—clusters of binding sites for multiple transcription factors; the second method is described in the accompanying protocol2. Clustering methods require extensive prior knowledge of the binding preferences of known transcription factors. In ideal cases, the process under study relies on the activities of two or more well-defined transcription factors. Even in such optimal cases, however, there is a high incidence of false positives. A 'hit rate' of 30–50% is the limit of precision that can be obtained with these methods3. Nonetheless, they are considerably more efficient than the identification of enhancers via 'blind' functional assays, whereby random genomic DNA fragments near or within a given gene are analyzed for regulatory activities. Clustering methods have been used to identify many new enhancers engaged in common developmental processes, permitting the construction of genomic regulatory networks4–7. Clustering methods were first developed nearly a decade ago8–11; however, there is still no 'best' technique or 'universal' software (comparable to BLAST, a fast alignment search tool). Instead, new techniques are constantly being developed, and some even merge clustering methods with phylogenetic analysis. Despite the flourishing diversity of methods, most use common strategies that we describe here in a sequence of steps, providing a format for the identification of functionally related enhancers and coregulated genes in animal genomes.

[1]  Frank Klawonn,et al.  Transcription regulatory region analysis using signal detection and fuzzy clustering , 1998, Bioinform..

[2]  W. Gilks,et al.  Recent computational approaches to understand gene regulation: mining gene regulation in silico. , 2007, Current genomics.

[3]  D. Haussler,et al.  Computational screening of conserved genomic DNA in search of functional noncoding elements , 2005, Nature Methods.

[4]  S. Salzberg,et al.  Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura , 2004, Genome Biology.

[5]  S. Carroll,et al.  The regulatory content of intergenic DNA shapes genome architecture , 2004, Genome Biology.

[6]  K. Roeder,et al.  A statistical model for locating regulatory regions in genomic DNA. , 1997, Journal of molecular biology.

[7]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[8]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[9]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[10]  A. Wagner,et al.  A computational genomics approach to the identification of gene networks. , 1997, Nucleic acids research.

[11]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Alexander E. Kel,et al.  Eukaryotic promoter recognition by binding sites for transcription factors , 1995, Comput. Appl. Biosci..

[13]  D. Haussler,et al.  Exploring relationships and mining data with the UCSC Gene Sorter. , 2005, Genome research.

[14]  H Niemann,et al.  Identification and analysis of eukaryotic promoters: recent computational approaches. , 2001, Trends in genetics : TIG.

[15]  Webb Miller,et al.  Evolution and functional classification of vertebrate gene deserts. , 2005, Genome research.

[16]  Michael Levine,et al.  Whole-genome analysis of Drosophila gastrulation. , 2004, Current opinion in genetics & development.