Protein–DNA binding in high-resolution

Abstract Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein–DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein–DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5′ → 3′ exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein–DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein–DNA binding events. We discuss the challenges and opportunities associated with such approaches.

[1]  B. Pugh,et al.  Genome-wide Nucleosome Specificity and Directionality of Chromatin Remodelers , 2012, Cell.

[2]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[3]  I. Amit,et al.  High-throughput chromatin immunoprecipitation for genome-wide mapping of in vivo protein-DNA interactions and epigenomic states , 2013, Nature Protocols.

[4]  O. Bensaude,et al.  Investigating RNA polymerase II carboxyl-terminal domain (CTD) phosphorylation. , 2003, European journal of biochemistry.

[5]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[6]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[7]  R. Hardison,et al.  A comprehensive and high-resolution genome-wide response of p53 to stress. , 2014, Cell reports.

[8]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[9]  Vishwanath R. Iyer,et al.  Widespread Misinterpretable ChIP-seq Bias in Yeast , 2013, PloS one.

[10]  Oliver J. Rando,et al.  Chromatin remodelling at promoters suppresses antisense transcription , 2007, Nature.

[11]  Kyoung-Jae Won,et al.  Genomic redistribution of GR monomers and dimers mediates transcriptional response to exogenous glucocorticoid in vivo , 2015, Genome research.

[12]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[13]  Jun S. Liu,et al.  Extracting sequence features to predict protein–DNA interactions: a comparative study , 2008, Nucleic acids research.

[14]  J. Ibrahim,et al.  ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions , 2011, Genome Biology.

[15]  Christopher M. Weber,et al.  Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. , 2014, Molecular cell.

[16]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[17]  M. Noll Subunit structure of chromatin , 1974, Nature.

[18]  William Stafford Noble,et al.  Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes , 2014, Nature Methods.

[19]  William Stafford Noble,et al.  High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions , 2010, PLoS Comput. Biol..

[20]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[21]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[22]  Finn Drabløs,et al.  A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs , 2010, Nucleic acids research.

[23]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[24]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[25]  Anaïs F. Bardet,et al.  Identification of transcription factor binding sites from ChIP-seq data at high resolution , 2013, Bioinform..

[26]  James Taylor,et al.  Genomic approaches towards finding cis-regulatory modules in animals , 2012, Nature Reviews Genetics.

[27]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[28]  Yue Zhao,et al.  Inferring Binding Energies from Selected Binding Sites , 2009, PLoS Comput. Biol..

[29]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[30]  Laura E. DeMare,et al.  The genomic landscape of cohesin-associated chromatin interactions , 2013, Genome research.

[31]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[32]  I. Albert,et al.  Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome , 2007, Nature.

[33]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[34]  I. Korf,et al.  Bind-n-Seq: high-throughput analysis of in vitro protein–DNA interactions using massively parallel sequencing , 2009, Nucleic acids research.

[35]  A. Sandelin,et al.  Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. , 2004, Journal of molecular biology.

[36]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[37]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[38]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[39]  V. Pirrotta Isolation of the operators of phage λ , 1973 .

[40]  Bryan J Venters,et al.  A comprehensive genomic binding map of gene and chromatin regulatory proteins in Saccharomyces. , 2011, Molecular cell.

[41]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[42]  S. Henikoff,et al.  Epigenome characterization at single base-pair resolution , 2011, Proceedings of the National Academy of Sciences.

[43]  A. Hartemink,et al.  An ensemble model of competitive multi-factor binding of the genome. , 2009, Genome research.

[44]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[45]  W. Wasserman,et al.  Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets , 2014, Genome Biology.

[46]  S. Luo,et al.  Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument , 2011, Nature Biotechnology.

[47]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[48]  M. Solomon,et al.  Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Alexander van Oudenaarden,et al.  Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins , 2013, Proceedings of the National Academy of Sciences.

[50]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[51]  A. Visel,et al.  Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. , 2010, Genome research.

[52]  Uwe Ohler,et al.  Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection , 2014, Nucleic acids research.

[53]  Wesley R. Legant,et al.  Single-Molecule Dynamics of Enhanceosome Assembly in Embryonic Stem Cells , 2014, Cell.

[54]  G. Stormo,et al.  Analysis of Homeodomain Specificities Allows the Family-wide Prediction of Preferred Recognition Sites , 2008, Cell.

[55]  Alexander J. Hartemink,et al.  Learning Protein-DNA Interaction Landscapes by Integrating Experimental Data through Computational Models , 2014, RECOMB.

[56]  B. Franklin Pugh,et al.  Subnucleosomal Structures and Nucleosome Asymmetry across a Genome , 2014, Cell.

[57]  Daniel E. Newburger,et al.  Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences , 2008, Cell.

[58]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[59]  Wenjie Fu,et al.  DISCOVER: a feature-based discriminative method for motif search in complex genomes , 2009, Bioinform..

[60]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[61]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[62]  M. Bulyk,et al.  Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. , 2013, Cell reports.

[63]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[64]  J. Weissman,et al.  Nascent transcript sequencing visualizes transcription at nucleotide resolution , 2011, Nature.

[65]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[66]  Barbara J Meyer,et al.  Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation , 2013, eLife.

[67]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[68]  Martha L. Bulyk,et al.  UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions , 2014, Nucleic Acids Res..

[69]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[70]  Myong-Hee Sung,et al.  DNase footprint signatures are dictated by factor dynamics and DNA sequence. , 2014, Molecular cell.

[71]  Nicholas A. Kent,et al.  Chromatin particle spectrum analysis: a method for comparative chromatin structure analysis using paired-end mode next-generation DNA sequencing , 2010, Nucleic acids research.

[72]  Jeff A. Bilmes,et al.  A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data , 2010, Bioinform..

[73]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[74]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[75]  Eran Segal,et al.  A Feature-Based Approach to Modeling Protein–DNA Interactions , 2007, RECOMB.

[76]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Raphael Gottardo,et al.  PICS: Probabilistic Inference for ChIP‐seq , 2009, Biometrics.

[78]  Clifford A. Meyer,et al.  Differential DNase I hypersensitivity reveals factor-dependent chromatin dynamics , 2012, Genome research.

[79]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[80]  Xuegong Zhang,et al.  Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite , 2011, BMC Systems Biology.

[81]  J. Lis,et al.  In vivo interactions of RNA polymerase II with genes of Drosophila melanogaster , 1985, Molecular and cellular biology.

[82]  Kristin R Brogaard,et al.  A base pair resolution map of nucleosome positions in yeast , 2012, Nature.

[83]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[84]  B. Franklin Pugh,et al.  Kinetic competition between elongation rate and binding of NELF controls promoter-proximal pausing. , 2013, Molecular cell.

[85]  Nancy Kleckner,et al.  Cohesins Bind to Preferential Sites along Yeast Chromosome III, with Differential Regulation along Arms versus the Centric Region , 1999, Cell.

[86]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[87]  Raluca Gordân,et al.  Protein−DNA binding in the absence of specific base-pair recognition , 2014, Proceedings of the National Academy of Sciences.

[88]  André L. Martins,et al.  Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo , 2012, PLoS genetics.

[89]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[90]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[91]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[92]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[93]  Leighton J. Core,et al.  Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing , 2013, Science.

[94]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[95]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[96]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[97]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[98]  J. Lis,et al.  Detecting protein-DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[100]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[101]  Thomas Abeel,et al.  Decoding ChIP-seq with a double-binding signal refines binding peaks to single-nucleotides and predicts cooperative interaction , 2014, Genome research.

[102]  Michael T. Zimmermann,et al.  MACE: model based analysis of ChIP-exo , 2014, Nucleic acids research.

[103]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[104]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[105]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[106]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[107]  T. Laajala,et al.  A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments , 2009, BMC Genomics.

[108]  Yiliang Ding,et al.  A hybridization-based approach for quantitative and low-bias single-stranded DNA ligation. , 2013, Analytical biochemistry.

[109]  André L. Martins,et al.  Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers , 2014, Nature Genetics.

[110]  Cizhong Jiang,et al.  Nucleosome positioning and gene regulation: advances through genomics , 2009, Nature Reviews Genetics.

[111]  A. Afek,et al.  Nonspecific protein-DNA binding is widespread in the yeast genome. , 2012, Biophysical journal.

[112]  Panayiotis V. Benos,et al.  DNA Familial Binding Profiles Made Easy: Comparison of Various Motif Alignment and Clustering Strategies , 2007, PLoS Comput. Biol..

[113]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[114]  Chee Seng Chan,et al.  CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells , 2011, Nature Genetics.

[115]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[116]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[117]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[118]  Leighton J. Core,et al.  Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters , 2008, Science.

[119]  J. Galagan,et al.  A blind deconvolution approach to high-resolution mapping of transcription factor binding sites from ChIP-seq data , 2009, Genome Biology.

[120]  J. Lis,et al.  Promoter melting and TFIID complexes on Drosophila genes in vivo. , 1992, Genes & development.

[121]  B. Franklin Pugh,et al.  SWR-C and INO80 Chromatin Remodelers Recognize Nucleosome-free Regions Near +1 Nucleosomes , 2013, Cell.

[122]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[123]  Andrew C. Adey,et al.  Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition , 2010, Genome Biology.

[124]  Istvan Albert,et al.  GeneTrack - a genomic data processing and visualization framework , 2008, Bioinform..

[125]  Hideaki Sugawara,et al.  Archiving next generation sequencing data , 2009, Nucleic Acids Res..

[126]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[127]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[128]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[129]  David K. Gifford,et al.  An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding , 2014, PLoS Comput. Biol..

[130]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[131]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[132]  Jason S Carroll,et al.  Development of an Illumina-based ChIP-exonuclease method provides insight into FoxA1-DNA binding properties , 2013, Genome Biology.

[133]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[134]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[135]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[136]  B. Pugh,et al.  Genome-wide mapping of nucleosome positions in yeast using high-resolution MNase ChIP-Seq. , 2012, Methods in enzymology.

[137]  Céline Hernandez,et al.  ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors , 2015, Genome research.

[138]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[139]  Sébastien Rodrigue,et al.  The Master Activator of IncA/C Conjugative Plasmids Stimulates Genomic Islands and Multidrug Resistance Dissemination , 2014, PLoS genetics.

[140]  A. Blais,et al.  Global MEF2 target gene analysis in cardiac and skeletal muscle reveals novel regulation of DUSP6 by p38MAPK-MEF2 signaling , 2014, Nucleic acids research.

[141]  R. Mann,et al.  Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox Proteins , 2011, Cell.

[142]  Gordon Robertson,et al.  Probabilistic Inference for Nucleosome Positioning with MNase-Based or Sonicated Short-Read Data , 2012, PloS one.

[143]  Raymond K. Auerbach,et al.  Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation , 2012, Cell.

[144]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[145]  B. Pugh,et al.  Genome-wide structure and organization of eukaryotic pre-initiation complexes , 2011, Nature.

[146]  F. van Roy,et al.  A flexible integrative approach based on random forest improves prediction of transcription factor binding sites , 2012, Nucleic acids research.

[147]  L. Mirny,et al.  Different gene regulation strategies revealed by analysis of binding motifs. , 2009, Trends in genetics : TIG.

[148]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[149]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.