Computational prediction of transcription-factor binding site locations

Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

[1]  D Sankoff,et al.  A test for nucleotide sequence homology. , 1973, Journal of molecular biology.

[2]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[3]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[4]  M. Goodman,et al.  Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. , 1988, Journal of molecular biology.

[5]  Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. , 1989, Molecular and cellular biology.

[6]  J. Berg,et al.  Redesigning the DNA‐binding specificity of a zinc finger protein: A data base‐guided approach , 1992, Proteins.

[7]  Jacobs Gh Determination of the base recognition positions of zinc fingers from sequence analysis. , 1992 .

[8]  J R Desjarlais,et al.  Toward rules relating zinc finger protein sequences and DNA binding site preferences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G. Jacobs,et al.  Determination of the base recognition positions of zinc fingers from sequence analysis. , 1992, The EMBO journal.

[10]  L. Hood,et al.  Striking sequence similarity over almost 100 kilobases of human and mouse T–cell receptor DNA , 1994, Nature Genetics.

[11]  A Klug,et al.  Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[12]  C. Pabo,et al.  Zinc finger phage: affinity selection of fingers with new DNA-binding specificities. , 1994, Science.

[13]  A Klug,et al.  Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M Suzuki,et al.  DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Brenner,et al.  Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Sharp,et al.  Structure-based design of transcription factors. , 1995, Science.

[17]  Gary D. Stormo,et al.  MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices , 1995, Comput. Appl. Biosci..

[18]  P. Sharp,et al.  Analysis of homeodomain function by structure-based design of a transcription factor. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[19]  B. Koop,et al.  Human and rodent DNA sequence comparisons: a mosaic model of genomic evolution. , 1995, Trends in genetics : TIG.

[20]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[21]  S. Brenner,et al.  Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). , 1996, Trends in genetics : TIG.

[22]  F. Blattner,et al.  Analysis and comparison of the mouse and human immunoglobulin heavy chain JH-Cmu-Cdelta locus. , 1996, Molecular Phylogenetics and Evolution.

[23]  Analysis and Comparison of the Mouse and Human Immunoglobulin Heavy Chain JH–Cμ–Cδ Locus , 1996 .

[24]  J. Fickett Coordinate positioning of MEF2 and myogenin binding sites. , 1996, Gene.

[25]  Dan S. Prestridge,et al.  SIGNAL SCAN 4.0: additional databases and sequence formats , 1996, Comput. Appl. Biosci..

[26]  T. Werner,et al.  GenomeInspector: basic software tools for analysis of spatial correlations between genomic structures within megabase sequences. , 1996, Genomics.

[27]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[28]  M. Blumenfeld,et al.  Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. , 1997, Journal of molecular biology.

[29]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[30]  T. Werner,et al.  A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. , 1997, Journal of molecular biology.

[31]  L. Gold,et al.  From oligonucleotide shapes to genomic SELEX: novel biological regulatory loops. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[32]  P. Bucher,et al.  Searching for regulatory elements in human noncoding sequences. , 1997, Current opinion in structural biology.

[33]  A Wagner Distribution of transcription factor binding sites in the yeast genome suggests abundance of coordinately regulated genes. , 1998, Genomics.

[34]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[35]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[36]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[37]  Frank Klawonn,et al.  Transcription regulatory region analysis using signal detection and fuzzy clustering , 1998, Bioinform..

[38]  R. Gibbs,et al.  Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. , 1998, Genome research.

[39]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[40]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[41]  Andreas Wagner,et al.  Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes , 1999, Bioinform..

[42]  Thomas Werner,et al.  Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity , 1999, Bioinform..

[43]  Martha L. Bulyk,et al.  Quantifying DNA–protein interactions by double-stranded DNA arrays , 1999, Nature Biotechnology.

[44]  Berthold Göttgens,et al.  Analysis of vertebrate SCL loci identifies conserved enhancers , 2000, Nature Biotechnology.

[45]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[46]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[47]  G. Church,et al.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. , 2000, Genome research.

[48]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[49]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[50]  C. Pabo,et al.  Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? , 2000, Journal of molecular biology.

[51]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[52]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[53]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[54]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[55]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[56]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[57]  S. Henikoff,et al.  Identification of in vivo DNA targets of chromatin proteins using tethered Dam methyltransferase , 2000, Nature Biotechnology.

[58]  P. Brown,et al.  Coordinate regulation of yeast ribosomal protein genes is associated with targeted recruitment of Esa1 histone acetylase. , 2000, Molecular cell.

[59]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[60]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[61]  Mathieu Blanchette,et al.  Algorithms for phylogenetic footprinting , 2001, RECOMB.

[62]  Webb Miller,et al.  Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster , 2001 .

[63]  Peter W. Markstein,et al.  Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[65]  Gary D. Stormo,et al.  Identifying target sites for cooperatively binding factors , 2001, Bioinform..

[66]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[67]  Sridhar Hannenhalli,et al.  Enrichment of regulatory signals in conserved non-coding genomic sequence , 2001, Bioinform..

[68]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[69]  T. Graves,et al.  Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. , 2001, Genome research.

[70]  W Miller,et al.  Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the alpha globin cluster. , 2001, Human molecular genetics.

[71]  G D Stormo,et al.  A comparative genomics approach to prediction of new members of regulons. , 2001, Genome research.

[72]  Hanah Margalit,et al.  A Structure-Based Approach for Prediction of Protein Binding Sites in Gene-Upstream Regions , 2000, Pacific Symposium on Biocomputing.

[73]  Steven Henikoff,et al.  Chromatin profiling using targeted DNA adenine methyltransferase , 2001, Nature Genetics.

[74]  M. Kreitman,et al.  Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. , 2001, Genome research.

[75]  Martin C. Frith,et al.  Detection of cis -element clusters in higher eukaryotic DNA , 2001, Bioinform..

[76]  G. Stormo,et al.  Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. , 2001, Nucleic acids research.

[77]  D R Bentley,et al.  Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. , 2001, Genome research.

[78]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[79]  Terrence S. Furey,et al.  Promoter Region-Based Classification of Genes , 2000, Pacific Symposium on Biocomputing.

[80]  Michael B. Eisen,et al.  Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles , 2001, ISMB.

[81]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[82]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[83]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[84]  Berthold Göttgens,et al.  Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci. , 2002, Genome research.

[85]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[86]  E. Davidson,et al.  Modeling transcriptional regulatory networks. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[87]  A. Gnirke,et al.  Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome , 2002, Genome Biology.

[88]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[89]  G A Whitmore,et al.  A Statistical Model for Investigating Binding Probabilities of DNA Nucleotide Sequences Using Microarrays , 2002, Biometrics.

[90]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[91]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[92]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[93]  David J States,et al.  Faculty Opinions recommendation of GATA-1 binding sites mapped in the beta-globin locus by using mammalian chIp-chip analysis. , 2002 .

[94]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[95]  M. Gerstein,et al.  GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp-chip analysis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[96]  T. Volkert,et al.  E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. , 2002, Genes & development.

[97]  Philipp Bucher,et al.  The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data , 2002, Nucleic Acids Res..

[98]  Michael Q. Zhang,et al.  Functional genomics as applied to mapping transcription regulatory networks. , 2002, Current opinion in microbiology.

[99]  Tim Hui-Ming Huang,et al.  Isolating human transcription factor targets by coupling chromatin immunoprecipitation and CpG island microarray analysis. , 2002, Genes & development.

[100]  L. Pachter,et al.  rVista for comparative sequence-based discovery of functional transcription factor binding sites. , 2002, Genome research.

[101]  Alexey S Kondrashov,et al.  Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae. , 2002, Nucleic acids research.

[102]  Richard A Young,et al.  Deciphering gene expression regulatory networks. , 2002, Current opinion in genetics & development.

[103]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[104]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[105]  Panayiotis V Benos,et al.  Is there a code for protein-DNA recognition? Probab(ilistical)ly. . . , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[106]  Eric D Siggia,et al.  Identification of the binding sites of regulatory proteins in bacterial genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[107]  Z. Weng,et al.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. , 2002, Nucleic acids research.

[108]  Arend Sidow,et al.  Sequence First. Ask Questions Later. , 2002, Cell.

[109]  Michael Levine,et al.  Decoding cis-regulatory DNAs in the Drosophila genome. , 2002, Current opinion in genetics & development.

[110]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[111]  S. Henikoff,et al.  Genome-Wide Profiling of DNA Methylation Reveals Transposon Targets of CHROMOMETHYLASE3 , 2002, Current Biology.

[112]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[113]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[114]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[115]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[116]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[117]  L. Pennacchio,et al.  Comparative genomic tools and databases: providing insights into the human genome. , 2003, The Journal of clinical investigation.

[118]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[119]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[120]  Harmen J. Bussemaker,et al.  Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[121]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[122]  Inna Dubchak,et al.  Multi-species sequence comparison: the next frontier in genome annotation , 2003, Genome Biology.

[123]  D. Church,et al.  Cross-species sequence comparisons: a review of methods and available resources. , 2003, Genome research.

[124]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[125]  L. Hood,et al.  Regulatory gene networks and the properties of the developmental process , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[126]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[127]  G. Church,et al.  A motif co-occurrence approach for genome-wide prediction of transcription-factor-binding sites in Escherichia coli. , 2004, Genome research.

[128]  E. Davidson Genomic Regulatory Systems: Development and Evolution , 2005 .

[129]  Li Yang,et al.  Munich Information Center for Protein Sequences Plant Genome Resources. A Framework for Integrative and Comparative Analyses1[w] , 2005, Plant Physiology.