Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes

There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.

[1]  Wyeth W. Wasserman,et al.  Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm , 2003, ISMB.

[2]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[3]  P. V. von Hippel,et al.  On the determination of deoxyribonucleic acid-protein interaction parameters using the nitrocellulose filter-binding assay. , 1983, Biochemistry.

[4]  Martha L. Bulyk,et al.  Quantifying DNA–protein interactions by double-stranded DNA arrays , 1999, Nature Biotechnology.

[5]  C. Vinson,et al.  Clustering of DNA sequences in human promoters. , 2004, Genome research.

[6]  Richard C. McEachin,et al.  Computationally Identifying Novel NF-κB-Regulated Immune Genes in the Human Genome , 2003 .

[7]  A Klug,et al.  A role in DNA binding for the linker sequences of the first three zinc fingers of TFIIIA. , 1993, Nucleic acids research.

[8]  D. Tautz Evolution of transcriptional regulation. , 2000, Current opinion in genetics & development.

[9]  M. Nóbrega,et al.  Comparative genomics at the vertebrate extremes , 2004, Nature Reviews Genetics.

[10]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[11]  T. Heinemeyer,et al.  Databases on transcriptional regulation : TRANSFAC , TRRD and COMPEL , 1997 .

[12]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[13]  T. Vavouri,et al.  Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both. , 2005, Current opinion in genetics & development.

[14]  M. M. Garner,et al.  A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system , 1981, Nucleic Acids Res..

[15]  Elmar Nöth,et al.  Interpolated markov chains for eukaryotic promoter recognition , 1999, Bioinform..

[16]  Michael Levine,et al.  Coordinate enhancers share common organizational features in the Drosophila genome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Jacques van Helden,et al.  Metrics for comparing regulatory sequences on the basis of pattern counts , 2004, Bioinform..

[18]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[19]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[20]  Yuriy L. Orlov,et al.  Complexity: an internet resource for analysis of DNA sequence complexity , 2004, Nucleic Acids Res..

[21]  Alexander E. Kel,et al.  COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation , 2000, Nucleic Acids Res..

[22]  Roded Sharan,et al.  CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments , 2003, ISMB.

[23]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[24]  Jiashun Zheng,et al.  An approach to identify over-represented cis-elements in related sequences. , 2003, Nucleic acids research.

[25]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[26]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[27]  L. Pachter,et al.  Strategies and tools for whole-genome alignments. , 2002, Genome research.

[28]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[29]  Anna G. Nazina,et al.  Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. , 2002, Genome research.

[30]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[31]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Martin Vingron,et al.  CORG: a database for COmparative Regulatory Genomics , 2003, Nucleic Acids Res..

[33]  Stephen K. Burley,et al.  1.9 Å resolution refined structure of TBP recognizing the minor groove of TATAAAAG , 1994, Nature Structural Biology.

[34]  Martin Vingron,et al.  Functional inference from non-random distributions of conserved predicted transcription factor binding sites , 2004, ISMB/ECCB.

[35]  U. Ohler,et al.  Promoter Prediction on a Genomic Scale – the Adh Experience , 2000 .

[36]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[37]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[38]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[39]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[40]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[41]  Jacques van Helden Metrics for comparing regulatory sequences on the basis of pattern counts. , 2004, Bioinformatics.

[42]  Eldon Emberly,et al.  Conservation of regulatory elements between two species of Drosophila , 2003, BMC Bioinformatics.

[43]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[44]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[45]  Benno Schwikowski,et al.  Algorithms for Phylogenetic Footprinting , 2002, J. Comput. Biol..

[46]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[47]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[48]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[49]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[50]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Edgar Wingender,et al.  The TRANSFAC System on Gene Regulation , 2000 .

[52]  R. Brent,et al.  A genetic model for interaction of the homeodomain recognition helix with DNA. , 1991, Science.

[53]  M. Gerstein,et al.  Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements , 2003, Journal of biology.

[54]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[55]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[56]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[57]  T. Hubbard,et al.  NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence , 2005, Nucleic acids research.

[58]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[59]  W. Miller,et al.  Distinguishing regulatory DNA from neutral sites. , 2003, Genome research.

[60]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[61]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[62]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[63]  S. Salzberg,et al.  Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura , 2004, Genome Biology.

[64]  Eric D Siggia,et al.  Computational methods for transcriptional regulation. , 2005, Current opinion in genetics & development.

[65]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[66]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Pierre Baldi,et al.  Distribution patterns of over-represented k-mers in non-coding yeast DNA , 2002, Bioinform..

[68]  John M. Hancock,et al.  High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. , 1999, Molecular biology and evolution.

[69]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[70]  J. Costas,et al.  Turnover of binding sites for transcription factors involved in early Drosophila development. , 2003, Gene.

[71]  Walter R. Gilks,et al.  Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test , 2004, BMC Bioinformatics.

[72]  S. Carroll,et al.  Control of a Genetic Regulatory Network by a Selector Gene , 2001, Science.

[73]  D. Bradley,et al.  Quantification of DNA-protein interaction by UV crosslinking. , 1995, Nucleic acids research.

[74]  Rongxiang Liu,et al.  Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. , 2003, Genome research.

[75]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[76]  A Klug,et al.  Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Dmitri A. Papatsenko,et al.  Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency , 2003, BMC Bioinformatics.

[78]  A. Reymond,et al.  Conserved non-genic sequences — an unexpected feature of mammalian genomes , 2005, Nature Reviews Genetics.

[79]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[80]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[81]  E. Davidson Genomic Regulatory Systems , 2001 .

[82]  Anna G. Nazina,et al.  Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. , 2003, Nucleic acids research.

[83]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[84]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[85]  P. Bucher,et al.  Searching for regulatory elements in human noncoding sequences. , 1997, Current opinion in structural biology.

[86]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[87]  G. Helt,et al.  Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.