Discovery of regulatory elements in vertebrates through comparative genomics

We have analyzed issues of reliability in studies in which comparative genomic approaches have been applied to the discovery of regulatory elements at a genome-wide level in vertebrates. We point out some potential problems with such studies, including difficulties in accurately identifying orthologous promoter regions. Many of these subtle analytical problems have become apparent only when studying the more complex vertebrate genomes. By determining motif reliability, we compared existing tools when applied to the discovery of vertebrate regulatory elements. We then used a statistical clustering method to produce a computational catalog of high quality putative regulatory elements from vertebrates, some of which are widely conserved among vertebrates and many of which are novel regulatory elements. The results provide a glimpse into the wealth of information that comparative genomics can yield and suggest the need for further improvement of genome-wide comparative computational techniques.

[1]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[2]  Lior Pachter,et al.  MAVID: constrained ancestral alignment of multiple sequences. , 2003, Genome research.

[3]  Olivier Elemento,et al.  Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach , 2005, Genome Biology.

[4]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[5]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[6]  Ajit Varki,et al.  Sequencing the chimpanzee genome: insights into human evolution and disease , 2003, Nature Reviews Genetics.

[7]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[8]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[9]  Martin Vingron,et al.  CORG: a database for COmparative Regulatory Genomics , 2003, Nucleic Acids Res..

[10]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[11]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[12]  M. Nozaki,et al.  Transcriptional Regulation of the Cytosolic Chaperonin θ Subunit Gene, Cctq, by Ets Domain Transcription Factors Elk-1, Sap-1a, and Net in the Absence of Serum Response Factor* , 2003, Journal of Biological Chemistry.

[13]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[14]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[15]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[16]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[17]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[18]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[19]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[20]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[21]  Holger Scholz,et al.  A role for the Wilms' tumor protein WT1 in organ development. , 2005, Physiology.

[22]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[23]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[24]  G. Thiel,et al.  cAMP response element binding protein (CREB) activates transcription via two distinct genetic elements of the human glucose-6-phosphatase gene , 2005, BMC Molecular Biology.

[25]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .