Base-resolution models of transcription factor binding reveal soft motif syntax

The arrangement of transcription factor (TF) binding motifs (syntax) is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution ChIP-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using CRISPR-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data. Highlights The neural network BPNet accurately predicts TF binding data at base-resolution. Model interpretation discovers TF motifs and TF interactions dependent on soft syntax. Motifs for Nanog and partners are preferentially spaced at ∼10.5 bp periodicity. Directional cooperativity is validated: Sox2 enhances Nanog binding, but not vice versa.

[1]  Richard W. Lusk,et al.  Evolutionary Mirages: Selection on Binding Site Composition Creates the Illusion of Conserved Grammars in Drosophila Enhancers , 2010, PLoS genetics.

[2]  M. Bulyk,et al.  Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds. , 2017, Cell systems.

[3]  Jianling Zhong,et al.  Mapping nucleosome positions using DNase-seq , 2016, Genome research.

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  T. Furey,et al.  DNase-seq predicts regions of rotational nucleosome stability across diverse human cell types , 2013, Genome research.

[6]  David R. Kelley,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks. , 2018, Genome research.

[7]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[8]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[9]  Sridhar Rao,et al.  Super-Enhancers at the Nanog Locus Differentially Regulate Neighboring Pluripotency-Associated Genes. , 2016, Cell reports.

[10]  Anshul Kundaje,et al.  Discovering epistatic feature interactions from neural network models of regulatory DNA sequences , 2018, bioRxiv.

[11]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[12]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[13]  G. Bourque,et al.  Transposable elements have rewired the core regulatory network of human embryonic stem cells , 2010, Nature Genetics.

[14]  Jacqueline M. Dresch,et al.  Quantitative perturbation-based analysis of gene expression predicts enhancer activity in early Drosophila embryo , 2016, eLife.

[15]  M. Halfon,et al.  Identifying transcriptional cis‐regulatory modules in animal genomes , 2015, Wiley interdisciplinary reviews. Developmental biology.

[16]  E. Furlong,et al.  Transcription factors: from enhancer binding to developmental control , 2012, Nature Reviews Genetics.

[17]  David A. Orlando,et al.  Enhancer decommissioning by LSD1 during embryonic stem cell differentiation , 2012, Nature.

[18]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[19]  Greg Donahue,et al.  Facilitators and Impediments of the Pluripotency Reprogramming Factors' Initial Engagement with the Genome , 2012, Cell.

[20]  Georg Seelig,et al.  A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation , 2019, Cell.

[21]  G. Marius Clore,et al.  Molecular Basis for Synergistic Transcriptional Activation by Oct1 and Sox2 Revealed from the Solution Structure of the 42-kDa Oct1·Sox2·Hoxb1-DNA Ternary Transcription Factor Complex* , 2004, Journal of Biological Chemistry.

[22]  Julia Zeitlinger,et al.  ChIP-nexus: a novel ChIP-exo protocol for improved detection of in vivo transcription factor binding footprints , 2014, Nature Biotechnology.

[23]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[24]  Uwe Ohler,et al.  Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection , 2014, Nucleic acids research.

[25]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[26]  A. Rowe,et al.  Distinct Contributions of Tryptophan Residues within the Dimerization Domain to Nanog Function , 2017, Journal of molecular biology.

[27]  Z. Weng,et al.  Genomic Binding Profiles of Functionally Distinct RNA Polymerase III Transcription Complexes in Human Cells , 2010, Nature Structural &Molecular Biology.

[28]  Anagha Joshi,et al.  Esrrb Is a Pivotal Target of the Gsk3/Tcf3 Axis Regulating Embryonic Stem Cell Self-Renewal , 2012, Cell stem cell.

[29]  Heidi Dvinge,et al.  PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci , 2010, BMC Bioinformatics.

[30]  A. Stathopoulos,et al.  Design flexibility in cis-regulatory control of gene expression: synthetic and comparative evidence. , 2009, Developmental biology.

[31]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[32]  Dmitrij Frishman,et al.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins , 2004, Nucleic Acids Res..

[33]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[34]  E. Morgunova,et al.  Structural perspective of cooperative transcription factor binding. , 2017, Current opinion in structural biology.

[35]  Anshul Kundaje,et al.  Umap and Bismap: quantifying genome and methylome mappability , 2016, bioRxiv.

[36]  Avanti Shrikumar,et al.  Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics , 2020, bioRxiv.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Chad A. Cowan,et al.  Rewirable gene regulatory networks in the preimplantation embryonic development of three mammalian species. , 2010, Genome research.

[39]  Wei Zhang,et al.  Suboptimization of developmental enhancers , 2015, Science.

[40]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[41]  Beilun Wang,et al.  Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks , 2016, PSB.

[42]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[43]  Mitchell D. Miller,et al.  Structure-based discovery of NANOG variant with enhanced properties to promote self-renewal and reprogramming of pluripotent stem cells , 2015, Proceedings of the National Academy of Sciences.

[44]  Teemu Kivioja,et al.  PeakXus: comprehensive transcription factor binding site discovery from ChIP-Nexus and ChIP-Exo experiments , 2016, Bioinform..

[45]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[46]  J. L. Mateo,et al.  CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool , 2015, PloS one.

[47]  Matthias Wilmanns,et al.  Synergism with the Coactivator OBF-1 (OCA-B, BOB-1) Is Mediated by a Specific POU Dimer Configuration , 2000, Cell.

[48]  Charles J. Vaske,et al.  Predicting DNA accessibility in the pan-cancer tumor genome using RNA-seq, WGS, and deep learning , 2017 .

[49]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[50]  Felicia S. L. Ng,et al.  Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells , 2014, Nucleic Acids Research.

[51]  S. Orkin,et al.  Requirement of Nanog dimerization for stem cell self-renewal and pluripotency , 2008, Proceedings of the National Academy of Sciences.

[52]  Felipe Merino,et al.  Cooperative DNA Recognition Modulated by an Interplay between Protein-Protein Interactions and DNA-Mediated Allostery , 2015, PLoS Comput. Biol..

[53]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[54]  Xi Chen,et al.  Reciprocal Transcriptional Regulation of Pou5f1 and Sox2 via the Oct4/Sox2 Complex in Embryonic Stem Cells , 2005, Molecular and Cellular Biology.

[55]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[56]  A. Pozner,et al.  PAtCh-Cap: input strategy for improving analysis of ChIP-exo data sets and beyond , 2016, Nucleic acids research.

[57]  Ian Chambers,et al.  A direct physical interaction between Nanog and Sox2 regulates embryonic stem cell self-renewal , 2013, The EMBO journal.

[58]  J. L. Mateo,et al.  Refined sgRNA efficacy prediction improves large- and small-scale CRISPR–Cas9 applications , 2017, Nucleic acids research.

[59]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[60]  Charles Y. Lin,et al.  Convergence of developmental and oncogenic signaling pathways at transcriptional super-enhancers. , 2015, Molecular cell.

[61]  Xiaohui Xie,et al.  FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data , 2017, bioRxiv.

[62]  Michael Levine,et al.  Coordinate enhancers share common organizational features in the Drosophila genome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  B. Cohen,et al.  Interactions between pluripotency factors specify cis-regulation in embryonic stem cells , 2016, Genome research.

[64]  D. Arnosti,et al.  Information display by transcriptional enhancers , 2003, Development.

[65]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[66]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[67]  Jerzy Tiuryn,et al.  Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers , 2013, Genome research.

[68]  Sharon E. Torigoe,et al.  A dynamic interplay of enhancer elements regulates Klf4 expression in naïve pluripotency , 2017, Genes & development.

[69]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[70]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[71]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[72]  Jun S. Song,et al.  Categorical spectral analysis of periodicity in nucleosomal DNA , 2016, Nucleic acids research.

[73]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[74]  E. Birney,et al.  A Transcription Factor Collective Defines Cardiac Cell Fate and Reflects Lineage History , 2012, Cell.

[75]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[76]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[77]  Céline Hernandez,et al.  ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors , 2015, Genome research.

[78]  T. Bailey,et al.  Inferring direct DNA binding from ChIP-seq , 2012, Nucleic acids research.

[79]  Biswajyoti Sahu,et al.  The interaction landscape between transcription factors and the nucleosome , 2018, Nature.

[80]  Benno Müller-Hill,et al.  Repression oflacPromoter as a Function of Distance, Phase and Quality of an AuxiliarylacOperator , 1996 .

[81]  J. Posakony,et al.  Role of Architecture in the Function and Specificity of Two Notch-Regulated Transcriptional Enhancer Modules , 2012, PLoS genetics.

[82]  David G. Knowles,et al.  Predicting Splicing from Primary Sequence with Deep Learning , 2019, Cell.

[83]  P. Robson,et al.  Selective influence of Sox2 on POU transcription factor binding in embryonic and neural stem cells , 2015, EMBO reports.

[84]  Lijiang Yang,et al.  Probing Allostery Through DNA , 2013, Science.

[85]  Jun Cheng,et al.  The Kipoi repository accelerates community exchange and reuse of predictive models for genomics , 2019, Nature Biotechnology.

[86]  M. Levine,et al.  Computational Models for Neurogenic Gene Expression in the Drosophila Embryo , 2006, Current Biology.

[87]  Charles Blatti,et al.  Computational Identification of Diverse Mechanisms Underlying Transcription Factor-DNA Occupancy , 2013, PLoS genetics.

[88]  Howard Y. Chang,et al.  Satb1 integrates DNA binding site geometry and torsional stress to differentially target nucleosome-dense regions , 2019, Nature Communications.

[89]  Shinya Yamanaka,et al.  Fbx15 Is a Novel Target of Oct3/4 but Is Dispensable for Embryonic Stem Cell Self-Renewal and Mouse Development , 2003, Molecular and Cellular Biology.

[90]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[91]  Yoseph Barash,et al.  Improving interpretability of deep learning models: splicing codes as a case study , 2019 .

[92]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[93]  D. Stillman,et al.  Specific interactions of Saccharomyces cerevisiae proteins with a promoter region of eukaryotic tRNA genes. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[94]  S. Tomlinson,et al.  Esrrb extinction triggers dismantling of naïve pluripotency and marks commitment to differentiation , 2018, The EMBO journal.

[95]  M. Murakami,et al.  The Homeoprotein Nanog Is Required for Maintenance of Pluripotency in Mouse Epiblast and ES Cells , 2003, Cell.

[96]  Sündüz Keleş,et al.  A Statistical Framework for the Analysis of ChIP-Seq Data , 2011, Journal of the American Statistical Association.

[97]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[98]  S. Barolo,et al.  Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. , 2010, Developmental cell.

[99]  E. Liu,et al.  Evolution of the mammalian transcription factor binding repertoire via transposable elements. , 2008, Genome research.

[100]  Mark Ptashne,et al.  Regulation of transcription: from lambda to eukaryotes. , 2005, Trends in biochemical sciences.

[101]  C. Todd,et al.  Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells , 2019, eLife.

[102]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[103]  O. Wrange,et al.  Accessibility of a glucocorticoid response element in a nucleosome depends on its rotational positioning , 1995, Molecular and cellular biology.

[104]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[105]  Victor B. Zhurkin,et al.  Rotational positioning of nucleosomes facilitates selective binding of p53 to response elements associated with cell cycle arrest , 2013, Nucleic acids research.

[106]  Jennifer A. Mitchell,et al.  Enhancers and super-enhancers have an equivalent regulatory role in embryonic stem cells through regulation of single or multiple genes , 2017, Genome research.

[107]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[108]  Ting Wang,et al.  Functional cis-regulatory modules encoded by mouse-specific endogenous retrovirus , 2017, Nature Communications.

[109]  E. Segal,et al.  Systematic interrogation of human promoters , 2019, Genome research.

[110]  Justin Crocker,et al.  Using synthetic biology to study gene regulatory evolution. , 2017, Current opinion in genetics & development.

[111]  Jonathan M. Cairns,et al.  Long-Range Enhancer Interactions Are Prevalent in Mouse Embryonic Stem Cells and Are Reorganized upon Pluripotent State Transition , 2018, Cell reports.

[112]  M. Pellegrini,et al.  Pioneer Transcription Factors Target Partial DNA Motifs on Nucleosomes to Initiate Reprogramming , 2015, Cell.

[113]  Patricia P. Chan,et al.  GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes , 2015, Nucleic Acids Res..

[114]  H R Drew,et al.  Structure of a B-DNA dodecamer: conformation and dynamics. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[115]  P. Robson,et al.  Dynamic changes in Sox2 spatio-temporal expression promote the second cell fate decision through Fgf4/Fgfr2 signaling in preimplantation mouse embryos , 2016, bioRxiv.

[116]  J. Banerji,et al.  Expression of a β-globin gene is enhanced by remote SV40 DNA sequences , 1981, Cell.

[117]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[118]  D. Ambrosetti,et al.  Synergistic activation of the fibroblast growth factor 4 enhancer by Sox2 and Oct-3 depends on protein-protein interactions facilitated by a specific spatial arrangement of factor binding sites , 1997, Molecular and cellular biology.

[119]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[120]  Peter C. Hollenhorst,et al.  Human RNA Polymerase III transcriptomes and relationships to Pol II promoters, enhancer-binding factors and chromatin domains , 2010, Nature Structural &Molecular Biology.

[121]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[122]  G Vriend,et al.  New POU dimer configuration mediates antagonistic control of an osteopontin preimplantation enhancer by Oct-4 and Sox-2. , 1998, Genes & development.

[123]  William Stafford Noble,et al.  Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors , 2012, Genome research.

[124]  B. Cohen,et al.  Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells , 2020, eLife.

[125]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[126]  Dmitri Papatsenko,et al.  Organization of developmental enhancers in the Drosophila embryo , 2009, Nucleic acids research.

[127]  Anna Shcherbina,et al.  TF-MoDISco v0.4.4.2-alpha: Technical Note , 2018, ArXiv.

[128]  Jun Cheng,et al.  Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks , 2017, bioRxiv.

[129]  Frederick P. Brooks,et al.  Computing smooth molecular surfaces , 1994, IEEE Computer Graphics and Applications.

[130]  Raluca Gordân,et al.  Distinguishing direct versus indirect transcription factor-DNA interactions. , 2009, Genome research.

[131]  Sorin Istrail,et al.  Eric Davidson's Regulatory Genome for Computer Science: Causality, Logic, and Proof Principles of the Genomic cis-Regulatory Code , 2019, J. Comput. Biol..

[132]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[133]  Dynamic regulation of chromatin accessibility by pluripotency transcription factors across the cell cycle , 2019, eLife.

[134]  A. Simeone,et al.  Reorganization of enhancer patterns in transition from naive to primed pluripotency. , 2014, Cell stem cell.

[135]  T. Maniatis,et al.  Virus induction of human IFNβ gene expression requires the assembly of an enhanceosome , 1995, Cell.

[136]  Saurabh Sinha,et al.  A Biophysical Model for Analysis of Transcription Factor Interaction and Binding Site Arrangement from Genome-Wide Binding Data , 2009, PloS one.

[137]  M. Ptashne,et al.  Cooperative binding of λ repressors to sites separated by integral turns of the DNA helix , 1986, Cell.

[138]  Nir Friedman,et al.  Deciphering eukaryotic gene-regulatory logic with 100 million random promoters , 2019, Nature Biotechnology.

[139]  A. Jolma,et al.  DNA-dependent formation of transcription factor pairs alters their binding specificity , 2015, Nature.

[140]  Lars Hufnagel,et al.  Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity , 2014, PLoS genetics.

[141]  Matthias Wilmanns,et al.  Crystal structure of a POU/HMG/DNA ternary complex suggests differential assembly of Oct4 and Sox2 on two enhancers. , 2003, Genes & development.

[142]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[143]  M. Levine,et al.  Long-range repression in the Drosophila embryo. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[144]  Galt P. Barber,et al.  BigWig and BigBed: enabling browsing of large distributed datasets , 2010, Bioinform..

[145]  Jon P. Connelly,et al.  CRIS.py: A Versatile and High-throughput Analysis Program for CRISPR-based Genome Editing , 2019, Scientific Reports.

[146]  J. Zeitlinger,et al.  Drosophila poised enhancers are generated during tissue patterning with the help of repression , 2016, bioRxiv.

[147]  Shaun Mahony,et al.  Characterizing protein-DNA binding event subtypes in ChIP-exo data , 2018, bioRxiv.

[148]  Z. Paroush,et al.  Capicua controls Toll/IL-1 signaling targets independently of RTK regulation , 2018, Proceedings of the National Academy of Sciences.

[149]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[150]  Jennifer A. Mitchell,et al.  A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potential , 2014, Genes & development.

[151]  J. Zeitlinger,et al.  Zelda overcomes the high intrinsic nucleosome barrier at enhancers during Drosophila zygotic genome activation , 2015, Genome research.