Computational screening of conserved genomic DNA in search of functional noncoding elements

tools. The protocol is divided into four major steps: defining the genomic region of interest, based on the user’s starting point (gene of interest, a region between two genetic markers, and other regions); selecting a subset of cross-species conserved elements within this region, based on a hidden Markov model that defines and scores genomic intervals for conservation; mapping the different properties of the interval set (such as transcript overlap and species coverage extent); and, finally, ranking the set for further analysis, based on a characteristic profile of the functional class of interest. The same protocol may be used to search for different functional classes of elements in all branches of the tree of life available in the UCSC Genome Browser, including vertebrate, insect, nematode and yeast. It can also easily incorporate custom types of information that the user has access to and allows for easy replacement of parts of the protocol, as our understanding of the relationship between function and sequence conservation, and of the different functional classes, improves. As an example, we present an informatic profile of vertebrate enhancer sequences and discuss a case for which such a method has led to the discovery of several functional enhancers. An accompanying protocol describes a complementary approach to identification of cis-regulatory DNA regions in complex genome assemblies by clustering of sequence motifs corresponding to known transcription factor binding sites 9 .

[1]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[2]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[3]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[4]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[5]  Vincent Lombard,et al.  The EMBL Nucleotide Sequence Database , 2002, Nucleic Acids Res..

[6]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[8]  Ye V Liu,et al.  Communication over a large distance: enhancers and insulators. , 2003, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[9]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[10]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[11]  Eugene V Koonin,et al.  A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. , 2003, Trends in genetics : TIG.

[12]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[13]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[14]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[15]  Ivan Ovcharenko,et al.  Interpreting mammalian evolution using Fugu genome comparisons. , 2004, Genomics.

[16]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[17]  S. Batzoglou,et al.  Characterization of evolutionary rates and constraints in three Mammalian genomes. , 2004, Genome research.

[18]  Nadav Ahituv,et al.  Exploiting human--fish genome comparisons for deciphering gene regulation. , 2004, Human molecular genetics.

[19]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[20]  David Haussler,et al.  Into the heart of darkness: large-scale clustering of human non-coding DNA , 2004, ISMB/ECCB.

[21]  D. Haussler,et al.  Exploring relationships and mining data with the UCSC Gene Sorter. , 2005, Genome research.

[22]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[23]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[24]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[25]  D. Kleinjan,et al.  Long-range control of gene expression: emerging mechanisms and disruption in disease. , 2005, American journal of human genetics.

[26]  Dmitri Papatsenko,et al.  Computational identification of regulatory DNAs underlying animal development , 2005, Nature Methods.

[27]  A. Reymond,et al.  Conserved non-genic sequences — an unexpected feature of mammalian genomes , 2005, Nature Reviews Genetics.