Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding

Significance Transcription factors (TFs) are key proteins that bind DNA targets to coordinate gene expression in cells. Understanding how TFs recognize their DNA targets is essential for predicting how variations in regulatory sequence disrupt transcription to cause disease. Here, we develop a high-throughput assay and analysis pipeline capable of measuring binding energies for over one million sequences with high resolution and apply it toward understanding how nucleotides flanking DNA targets affect binding energies for two model yeast TFs. Through systematic comparisons between models trained on these data, we establish that considering dinucleotide (DN) interactions is sufficient to accurately predict binding and further show that sites used by TFs in vivo are both energetically and mutationally distant from the highest affinity sequence. Transcription factors (TFs) are primary regulators of gene expression in cells, where they bind specific genomic target sites to control transcription. Quantitative measurements of TF–DNA binding energies can improve the accuracy of predictions of TF occupancy and downstream gene expression in vivo and shed light on how transcriptional networks are rewired throughout evolution. Here, we present a sequencing-based TF binding assay and analysis pipeline (BET-seq, for Binding Energy Topography by sequencing) capable of providing quantitative estimates of binding energies for more than one million DNA sequences in parallel at high energetic resolution. Using this platform, we measured the binding energies associated with all possible combinations of 10 nucleotides flanking the known consensus DNA target interacting with two model yeast TFs, Pho4 and Cbf1. A large fraction of these flanking mutations change overall binding energies by an amount equal to or greater than consensus site mutations, suggesting that current definitions of TF binding sites may be too restrictive. By systematically comparing estimates of binding energies output by deep neural networks (NNs) and biophysical models trained on these data, we establish that dinucleotide (DN) specificities are sufficient to explain essentially all variance in observed binding behavior, with Cbf1 binding exhibiting significantly more nonadditivity than Pho4. NN-derived binding energies agree with orthogonal biochemical measurements and reveal that dynamically occupied sites in vivo are both energetically and mutationally distant from the highest affinity sites.

[1]  Sebastian J Maerkl,et al.  Mapping the fine structure of a eukaryotic promoter input-output function , 2013, Nature Genetics.

[2]  Susan Jones,et al.  An overview of the basic helix-loop-helix proteins , 2004, Genome Biology.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[5]  Terence P. Speed,et al.  Finding short DNA motifs using permuted markov models , 2004, RECOMB.

[6]  Philipp Bucher,et al.  SMiLE-seq identifies binding motifs of single and dimeric transcription factors , 2017, Nature Methods.

[7]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[8]  R. Mann,et al.  Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox Proteins , 2011, Cell.

[9]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[10]  N. D. Clarke,et al.  Differential binding of the related transcription factors Pho4 and Cbf1 can tune the sensitivity of promoters to different levels of an induction signal , 2013, Nucleic acids research.

[11]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[12]  Omar Wagih,et al.  ggseqlogo: a versatile R package for drawing sequence logos , 2017, Bioinform..

[13]  Raluca Gordân,et al.  Protein−DNA binding in the absence of specific base-pair recognition , 2014, Proceedings of the National Academy of Sciences.

[14]  S. P. Fodor,et al.  Counting individual DNA molecules by the stochastic attachment of diverse labels , 2011, Proceedings of the National Academy of Sciences.

[15]  Timothy K Lee,et al.  Single-cell NF-κB dynamics reveal digital activation and analogue information processing , 2010, Nature.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Stephan Saalfeld,et al.  Globally optimal stitching of tiled 3D microscopic image acquisitions , 2009, Bioinform..

[18]  Alexandre V. Morozov,et al.  Biophysical Fitness Landscapes for Transcription Factor Binding Sites , 2013, PLoS Comput. Biol..

[19]  R. Shamir,et al.  Transcription factor family‐specific DNA shape readout revealed by quantitative specificity models , 2017, Molecular systems biology.

[20]  T. Eulgem Eukaryotic transcription factors , 2001, Genome Biology.

[21]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[22]  R. Mann,et al.  Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE , 2015, eLife.

[23]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[24]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[25]  M. Model,et al.  A standard for calibration and shading correction of a fluorescence microscope. , 2001, Cytometry.

[26]  M. Bulyk,et al.  Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. , 2013, Cell reports.

[27]  R. Mann,et al.  Low Affinity Binding Site Clusters Confer Hox Specificity and Regulatory Robustness , 2015, Cell.

[28]  Y. Kyōgoku,et al.  Crystal structure of PHO4 bHLH domain–DNA complex: flanking base recognition , 1997, The EMBO journal.

[29]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[30]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[31]  X. Xie,et al.  Single Molecule Imaging of Transcription Factor Binding to DNA in Live Mammalian Cells , 2013, Nature Methods.

[32]  Terence Hwa,et al.  Transcriptional regulation by the numbers: models. , 2005, Current opinion in genetics & development.

[33]  A. Burlingame,et al.  The Site-Specific Installation of Methyl-Lysine Analogs into Recombinant Histones , 2007, Cell.

[34]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[35]  G. Tkačik,et al.  Dynamics of Transcription Factor Binding Site Evolution , 2015, PLoS genetics.

[36]  Zheng Zuo,et al.  High-Resolution Specificity from DNA Sequencing Highlights Alternative Modes of Lac Repressor Binding , 2014, Genetics.

[37]  Eran Segal,et al.  A Feature-Based Approach to Modeling Protein–DNA Interactions , 2007, RECOMB.

[38]  G. Stormo,et al.  Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions , 2012, Genetics.

[39]  R. Tsien,et al.  Partitioning of Lipid-Modified Monomeric GFPs into Membrane Microdomains of Live Cells , 2002, Science.

[40]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[41]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[42]  Lin Yang,et al.  TFBSshape: a motif database for DNA shape features of transcription factor binding sites , 2013, Nucleic Acids Res..

[43]  Edward J. Oakeley,et al.  Position dependencies in transcription factor binding sites , 2007, Bioinform..

[44]  R. Siddharthan Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix , 2010, PloS one.

[45]  R. Shamir,et al.  SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics , 2016, Scientific Reports.

[46]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[47]  L. Hellman,et al.  Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions , 2007, Nature Protocols.

[48]  J. Hegemann,et al.  CPF1, a yeast protein which functions in centromeres and promoters. , 1990, The EMBO journal.

[49]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[50]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[51]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[52]  Polly M Fordyce,et al.  Basic leucine zipper transcription factor Hac1 binds DNA in two distinct modes as revealed by microfluidic analyses , 2012, Proceedings of the National Academy of Sciences.

[53]  E. Siggia,et al.  Connecting protein structure with predictions of regulatory sites , 2007, Proceedings of the National Academy of Sciences.

[54]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[55]  R. Roeder,et al.  Chemically ubiquitylated histone H2B stimulates hDot1L-mediated intranucleosomal methylation , 2008, Nature.

[56]  Kara Brower,et al.  An Open-Source, Programmable Pneumatic Setup for Operation and Automated Control of Single- and Multi-Layer Microfluidic Devices. , 2018, HardwareX.

[57]  C. Goding,et al.  Single amino acid substitutions alter helix‐loop‐helix protein specificity for bases flanking the core CANNTG motif. , 1992, The EMBO journal.

[58]  E. O’Shea,et al.  A quantitative model of transcription factor–activated gene expression , 2008, Nature Structural &Molecular Biology.

[59]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[60]  H. Lähdesmäki,et al.  A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays , 2011, PloS one.

[61]  S. Quake,et al.  A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors , 2007, Science.

[62]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[63]  Lin Yang,et al.  DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding , 2015, Bioinform..

[64]  R. Mann,et al.  Deconvolving the Recognition of DNA Shape from Sequence , 2015, Cell.

[65]  Z. Yakhini,et al.  Unraveling determinants of transcription factor binding outside the core binding site , 2015, Genome research.

[66]  S. Luo,et al.  Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument , 2011, Nature Biotechnology.

[67]  Erin K O'Shea,et al.  Signal-dependent dynamics of transcription factor translocation controls gene expression , 2011, Nature Structural &Molecular Biology.

[68]  S. Quake,et al.  De Novo Identification and Biophysical Characterization of Transcription Factor Binding Sites with Microfluidic Affinity Analysis , 2010, Nature Biotechnology.

[69]  Lin Yang,et al.  DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale , 2013, Nucleic Acids Res..

[70]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[71]  M. Levine,et al.  Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers , 2016, Proceedings of the National Academy of Sciences.

[72]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[73]  Dieter Söll,et al.  A chemical biology route to site-specific authentic protein modifications , 2016, Science.

[74]  Christopher L. Warren,et al.  Specificity landscapes of DNA binding molecules elucidate biological function , 2010, Proceedings of the National Academy of Sciences.

[75]  Barbara E. Engelhardt,et al.  Stability selection for regression-based models of transcription factor–DNA binding specificity , 2013, Bioinform..

[76]  E. Siggia,et al.  Analysis of Combinatorial cis-Regulation in Synthetic and Genomic Promoters , 2008, Nature.

[77]  S. P. Fodor,et al.  Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations , 2014, Proceedings of the National Academy of Sciences.

[78]  L. E. McDonald,et al.  A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Raluca Gordân,et al.  Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes , 2015, PLoS Comput. Biol..

[80]  Justin Crocker,et al.  The Soft Touch: Low-Affinity Transcription Factor Binding Sites in Development and Evolution. , 2016, Current topics in developmental biology.

[81]  E. O’Shea,et al.  Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4. , 2011, Molecular cell.

[82]  Timothy R Holzberg,et al.  5-Hydroxymethylcytosine in E-box motifs ACAT|GTG and ACAC|GTG increases DNA-binding of the B-HLH transcription factor TCF4. , 2016, Integrative biology : quantitative biosciences from nano to macro.

[83]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[84]  D. Schübeler,et al.  Impact of cytosine methylation on DNA binding specificities of human transcription factors , 2017, Science.

[85]  G. Stormo,et al.  Quantitative analysis demonstrates most transcription factors require only simple models of specificity , 2011, Nature Biotechnology.

[86]  Ville Mustonen,et al.  Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites , 2008, Proceedings of the National Academy of Sciences.

[87]  Yue Zhao,et al.  Inferring Binding Energies from Selected Binding Sites , 2009, PLoS Comput. Biol..

[88]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[89]  I. Korf,et al.  Bind-n-Seq: high-throughput analysis of in vitro protein–DNA interactions using massively parallel sequencing , 2009, Nucleic acids research.

[90]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[91]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[92]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[93]  E. O’Shea,et al.  Chromatin decouples promoter threshold from dynamic range , 2008, Nature.

[94]  Joshua L. Payne,et al.  A thousand empirical adaptive landscapes and their navigability , 2017, Nature Ecology &Evolution.

[95]  Gary D. Stormo,et al.  ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces species , 2011, Nucleic Acids Res..

[96]  Aviv Regev,et al.  Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets , 2009, Science.

[97]  Eran Segal,et al.  Incorporating Nucleosomes into Thermodynamic Models of Transcription Regulation , 2009, RECOMB.

[98]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[99]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[100]  E. Segal,et al.  Predicting expression patterns from regulatory sequence in Drosophila segmentation , 2008, Nature.

[101]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.