Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism

Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.

[1]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[2]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[3]  Bartek Wilczynski,et al.  Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State , 2012, PLoS Comput. Biol..

[4]  Michael B. Eisen,et al.  Zelda Binding in the Early Drosophila melanogaster Embryo Marks Regions Subsequently Activated at the Maternal-to-Zygotic Transition , 2011, PLoS genetics.

[5]  S. Sinha,et al.  Transcriptional regulation of brain gene expression in response to a territorial intrusion , 2012, Proceedings of the Royal Society B: Biological Sciences.

[6]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[7]  R. Kingston,et al.  Regulation of Polycomb group complexes by the sequence-specific DNA binding proteins Zeste and GAGA. , 2003, Genes & development.

[8]  E. Segal,et al.  Predicting expression patterns from regulatory sequence in Drosophila segmentation , 2008, Nature.

[9]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[10]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[11]  A. Stark,et al.  Transcriptional enhancers: from properties to genome-wide predictions , 2014, Nature Reviews Genetics.

[12]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[13]  Jing Chen,et al.  Genome-Wide Signatures of Transcription Factor Activity: Connecting Transcription Factors, Disease, and Small Molecules , 2013, PLoS Comput. Biol..

[14]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[15]  Andreas R. Pfenning,et al.  Core and region-enriched networks of behaviorally regulated genes and the singing genome , 2014, Science.

[16]  Tamer Kahveci,et al.  Accessed Terms of Use , 2022 .

[17]  Saurabh Sinha,et al.  Neuromolecular responses to social challenge: Common mechanisms across mouse, stickleback fish, and honey bee , 2014, Proceedings of the National Academy of Sciences.

[18]  D. J. McKay,et al.  A common set of DNA regulatory elements shapes Drosophila appendages. , 2013, Developmental cell.

[19]  James B. Brown,et al.  Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions , 2009, Genome Biology.

[20]  Ting Ni,et al.  Integrative analysis of the zinc finger transcription factor Lame duck in the Drosophila myogenic gene regulatory network , 2012, Proceedings of the National Academy of Sciences.

[21]  Sarah A. Teichmann,et al.  Assessing Computational Methods of Cis-Regulatory Module Prediction , 2010, PLoS Comput. Biol..

[22]  S. Henikoff,et al.  High-resolution mapping defines the cooperative architecture of Polycomb response elements , 2014, Genome research.

[23]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[24]  J. Carroll,et al.  Pioneer transcription factors: establishing competence for gene expression. , 2011, Genes & development.

[25]  J. Stamatoyannopoulos,et al.  Quantitative Models of the Mechanisms That Control Genome-Wide Patterns of Transcription Factor Binding during Early Drosophila Development , 2011, PLoS genetics.

[26]  Michael R. Green,et al.  Characterization of enhancer function from genome-wide analyses. , 2012, Annual review of genomics and human genetics (Print).

[27]  Cory Y. McLean,et al.  PRISM offers a comprehensive genomic approach to transcription factor function prediction , 2013, Genome research.

[28]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[29]  M. Karakozova,et al.  Zeste can facilitate long-range enhancer–promoter communication and insulator bypass in Drosophila melanogaster , 2009, Chromosoma.

[30]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[31]  E. Furlong,et al.  Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development , 2012, Nature Genetics.

[32]  Jie Wang,et al.  Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium , 2012, Nucleic Acids Res..

[33]  Saurabh Sinha,et al.  FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system , 2010, Nucleic Acids Res..

[34]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[35]  Katherine S. Pollard,et al.  Integrating Diverse Datasets Improves Developmental Enhancer Prediction , 2013, PLoS Comput. Biol..

[36]  Charles Blatti,et al.  Computational Identification of Diverse Mechanisms Underlying Transcription Factor-DNA Occupancy , 2013, PLoS genetics.

[37]  Xin He,et al.  Thermodynamics-Based Models of Transcriptional Regulation by Enhancers: The Roles of Synergistic Activation, Cooperative Binding and Short-Range Repression , 2010, PLoS Comput. Biol..

[38]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[39]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[40]  Amos Tanay,et al.  Functional Anatomy of Polycomb and Trithorax Chromatin Landscapes in Drosophila Embryos , 2009, PLoS biology.

[41]  Steven M. Gallo,et al.  REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila , 2010, Nucleic Acids Res..

[42]  Charles Blatti,et al.  Quantitative Analysis of the Drosophila Segmentation Regulatory Network Using Pattern Generating Potentials , 2010, PLoS biology.

[43]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[44]  Daeyoup Lee,et al.  Decoding the genome with an integrative analysis tool: Combinatorial CRM Decoder , 2011, Nucleic acids research.

[45]  Saurabh Sinha,et al.  Functional Characterization of Transcription Factor Motifs Using Cross-species Comparison across Large Evolutionary Distances , 2010, PLoS Comput. Biol..

[46]  E. Siggia,et al.  Connecting protein structure with predictions of regulatory sites , 2007, Proceedings of the National Academy of Sciences.

[47]  J. Stamatoyannopoulos,et al.  The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding , 2011, Genome Biology.

[48]  E. Furlong,et al.  Combinatorial binding predicts spatio-temporal cis-regulatory activity , 2009, Nature.

[49]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[50]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[51]  Stephen Butcher,et al.  Temporal Coordination of Gene Networks by Zelda in the Early Drosophila Embryo , 2011, PLoS genetics.

[52]  Barry J Dickson,et al.  HOT regions function as patterned developmental enhancers and have a distinct cis-regulatory signature. , 2012, Genes & development.

[53]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[54]  Peter J. Bickel,et al.  The Developmental Transcriptome of Drosophila melanogaster , 2010, Nature.

[55]  Manolis Kellis,et al.  Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types , 2013, Genome research.

[56]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[57]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[58]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[59]  P. Georgel,et al.  GAGA protein: a multi-faceted transcription factor. , 2006, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[60]  Wei Xie,et al.  RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State , 2013, PLoS Comput. Biol..

[61]  R. Sandstrom,et al.  Dynamic reprogramming of chromatin accessibility during Drosophila embryo development , 2011, Genome Biology.

[62]  R. E. Page,et al.  New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior , 2012, Proceedings of the National Academy of Sciences.

[63]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[64]  Saurabh Sinha,et al.  A Biophysical Model for Analysis of Transcription Factor Interaction and Binding Site Arrangement from Genome-Wide Binding Data , 2009, PloS one.

[65]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[66]  Robert L. Grossman,et al.  A cis-regulatory map of the Drosophila genome , 2011, Nature.