Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding

Transcription factors (TFs) are primary regulators of gene expression in cells, where they bind specific genomic target sites to control transcription. Quantitative measurements of TF-DNA binding energies can improve the accuracy of predictions of TF occupancy and downstream gene expression in vivo and further shed light on how transcriptional networks are rewired throughout evolution. Here, we present a novel sequencing-based TF binding assay and analysis pipeline capable of providing quantitative estimates of binding energies for more than one million DNA sequences in parallel at high energetic resolution. Using this platform, we measured the binding energies associated with all possible combinations of 10 nucleotides flanking the known consensus DNA target for two model yeast TFs, Pho4 and Cbf1. A large fraction of these flanking mutations change overall binding energies by an amount equal to or greater than consensus site mutations, suggesting that current definitions of TF binding sites may be too restrictive. By systematically comparing estimates of binding energies output by deep neural networks (NN) and biophysical models trained on these data, we establish that dinucleotide specificities are sufficient to explain essentially all variance in observed binding behavior, with Cbf1 binding exhibiting significantly more epistasis than Pho4. NN-derived binding energies agree with orthogonal biochemical measurements and reveal that dynamically occupied sites in vivo are both energetically and mutationally distant from the highest-affinity sites.

[1]  R. Roeder,et al.  Chemically ubiquitylated histone H2B stimulates hDot1L-mediated intranucleosomal methylation , 2008, Nature.

[2]  Joshua L. Payne,et al.  A thousand empirical adaptive landscapes and their navigability , 2017, Nature Ecology &Evolution.

[3]  Gary D. Stormo,et al.  ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces species , 2011, Nucleic Acids Res..

[4]  R. Mann,et al.  Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox Proteins , 2011, Cell.

[5]  C. Goding,et al.  Single amino acid substitutions alter helix‐loop‐helix protein specificity for bases flanking the core CANNTG motif. , 1992, The EMBO journal.

[6]  E. O’Shea,et al.  A quantitative model of transcription factor–activated gene expression , 2008, Nature Structural &Molecular Biology.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Barbara E. Engelhardt,et al.  Stability selection for regression-based models of transcription factor–DNA binding specificity , 2013, Bioinform..

[9]  Aviv Regev,et al.  Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets , 2009, Science.

[10]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[11]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[12]  Ville Mustonen,et al.  Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites , 2008, Proceedings of the National Academy of Sciences.

[13]  Yue Zhao,et al.  Inferring Binding Energies from Selected Binding Sites , 2009, PLoS Comput. Biol..

[14]  Terence P. Speed,et al.  Finding short DNA motifs using permuted markov models , 2004, RECOMB.

[15]  Alexandre V. Morozov,et al.  Biophysical Fitness Landscapes for Transcription Factor Binding Sites , 2013, PLoS Comput. Biol..

[16]  Eran Segal,et al.  Incorporating Nucleosomes into Thermodynamic Models of Transcription Regulation , 2009, RECOMB.

[17]  I. Korf,et al.  Bind-n-Seq: high-throughput analysis of in vitro protein–DNA interactions using massively parallel sequencing , 2009, Nucleic acids research.

[18]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[19]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[20]  R. Mann,et al.  Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE , 2015, eLife.

[21]  G. Stormo,et al.  Quantitative analysis demonstrates most transcription factors require only simple models of specificity , 2011, Nature Biotechnology.

[22]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[23]  S. Quake,et al.  De Novo Identification and Biophysical Characterization of Transcription Factor Binding Sites with Microfluidic Affinity Analysis , 2010, Nature Biotechnology.

[24]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[25]  Lin Yang,et al.  DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale , 2013, Nucleic Acids Res..

[26]  E. Siggia,et al.  Analysis of Combinatorial cis-Regulation in Synthetic and Genomic Promoters , 2008, Nature.

[27]  R. Shamir,et al.  Transcription factor family‐specific DNA shape readout revealed by quantitative specificity models , 2017, Molecular systems biology.

[28]  T. Eulgem Eukaryotic transcription factors , 2001, Genome Biology.

[29]  E. O’Shea,et al.  Chromatin decouples promoter threshold from dynamic range , 2008, Nature.

[30]  Y. Kyōgoku,et al.  Crystal structure of PHO4 bHLH domain–DNA complex: flanking base recognition , 1997, The EMBO journal.

[31]  Terence Hwa,et al.  Transcriptional regulation by the numbers: models. , 2005, Current opinion in genetics & development.

[32]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[33]  S. P. Fodor,et al.  Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations , 2014, Proceedings of the National Academy of Sciences.

[34]  G. Stormo,et al.  Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions , 2012, Genetics.

[35]  Raluca Gordân,et al.  Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes , 2015, PLoS Comput. Biol..

[36]  Justin Crocker,et al.  The Soft Touch: Low-Affinity Transcription Factor Binding Sites in Development and Evolution. , 2016, Current topics in developmental biology.

[37]  E. O’Shea,et al.  Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4. , 2011, Molecular cell.

[38]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[39]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[40]  Lin Yang,et al.  TFBSshape: a motif database for DNA shape features of transcription factor binding sites , 2013, Nucleic Acids Res..

[41]  R. Shamir,et al.  SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics , 2016, Scientific Reports.

[42]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[43]  Omar Wagih,et al.  ggseqlogo: a versatile R package for drawing sequence logos , 2017, Bioinform..

[44]  M. Levine,et al.  Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers , 2016, Proceedings of the National Academy of Sciences.

[45]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[46]  Dieter Söll,et al.  A chemical biology route to site-specific authentic protein modifications , 2016, Science.

[47]  Polly M Fordyce,et al.  Basic leucine zipper transcription factor Hac1 binds DNA in two distinct modes as revealed by microfluidic analyses , 2012, Proceedings of the National Academy of Sciences.

[48]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[49]  Edward J. Oakeley,et al.  Position dependencies in transcription factor binding sites , 2007, Bioinform..

[50]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  L. Hellman,et al.  Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions , 2007, Nature Protocols.

[52]  J. Hegemann,et al.  CPF1, a yeast protein which functions in centromeres and promoters. , 1990, The EMBO journal.

[53]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[54]  Zheng Zuo,et al.  High-Resolution Specificity from DNA Sequencing Highlights Alternative Modes of Lac Repressor Binding , 2014, Genetics.

[55]  Eran Segal,et al.  A Feature-Based Approach to Modeling Protein–DNA Interactions , 2007, RECOMB.

[56]  Lin Yang,et al.  DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding , 2015, Bioinform..

[57]  R. Mann,et al.  Deconvolving the Recognition of DNA Shape from Sequence , 2015, Cell.

[58]  Z. Yakhini,et al.  Unraveling determinants of transcription factor binding outside the core binding site , 2015, Genome research.

[59]  S. Luo,et al.  Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument , 2011, Nature Biotechnology.

[60]  E. Siggia,et al.  Connecting protein structure with predictions of regulatory sites , 2007, Proceedings of the National Academy of Sciences.

[61]  S. Quake,et al.  A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors , 2007, Science.

[62]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[63]  G. Tkačik,et al.  Dynamics of Transcription Factor Binding Site Evolution , 2015, PLoS genetics.

[64]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[65]  Mathew G. Lewsey,et al.  Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape , 2016, Cell.

[66]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[67]  N. D. Clarke,et al.  Differential binding of the related transcription factors Pho4 and Cbf1 can tune the sensitivity of promoters to different levels of an induction signal , 2013, Nucleic acids research.

[68]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[69]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[70]  Raluca Gordân,et al.  Protein−DNA binding in the absence of specific base-pair recognition , 2014, Proceedings of the National Academy of Sciences.

[71]  S. P. Fodor,et al.  Counting individual DNA molecules by the stochastic attachment of diverse labels , 2011, Proceedings of the National Academy of Sciences.

[72]  H. Lähdesmäki,et al.  A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays , 2011, PloS one.

[73]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[74]  M. Bulyk,et al.  Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. , 2013, Cell reports.

[75]  A. Burlingame,et al.  The Site-Specific Installation of Methyl-Lysine Analogs into Recombinant Histones , 2007, Cell.

[76]  R. Siddharthan Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix , 2010, PloS one.

[77]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[78]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[79]  E. Segal,et al.  Predicting expression patterns from regulatory sequence in Drosophila segmentation , 2008, Nature.

[80]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[81]  Sebastian J Maerkl,et al.  Mapping the fine structure of a eukaryotic promoter input-output function , 2013, Nature Genetics.

[82]  Joseph R. Ecker,et al.  Erratum: Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape (Cell (2016) 165(5) (1280–1292)) , 2016 .

[83]  Susan Jones,et al.  An overview of the basic helix-loop-helix proteins , 2004, Genome Biology.

[84]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[85]  Philipp Bucher,et al.  SMiLE-seq identifies binding motifs of single and dimeric transcription factors , 2017, Nature Methods.