Global reference mapping and dynamics of human transcription factor footprints

Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits1, yet it remains challenging to distinguish variants that impact regulatory function2. Genomic DNase I footprinting enables quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin3–5. However, to date only a small fraction of such sites have been precisely resolved on the human genome sequence5. To enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate at nucleotide resolution ~4.5 million compact genomic elements encoding transcription factor occupancy. We map the fine-scale structure of ~1.6 million DHS and show that the overwhelming majority is populated by well-spaced sites of single transcription factor:DNA interaction. Cell context-dependent cis-regulation is chiefly executed by wholesale actuation of accessibility at regulatory DNA versus by differential transcription factor occupancy within accessible elements. We show further that the well-described enrichment of disease- and phenotypic trait-associated genetic variants in regulatory regions1,6 is almost entirely attributable to variants localizing within footprints, and that functional variants impacting transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find that the global density of human genetic variation is markedly increased within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a new framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation.

[1]  Alex P. Reynolds,et al.  Index and biological spectrum of accessible DNA elements in the human genome , 2019, bioRxiv.

[2]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[3]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[4]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[5]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[6]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[7]  F. A. Kolpakov,et al.  HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis , 2017, Nucleic Acids Res..

[8]  Yanli Wang,et al.  Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites , 2017, Cell Research.

[9]  D. Schübeler,et al.  Impact of cytosine methylation on DNA binding specificities of human transcription factors , 2017, Science.

[10]  H. Kang,et al.  Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans , 2017, Nature Communications.

[11]  Jeff Vierstra,et al.  Genomic footprinting , 2016, Nature Methods.

[12]  G. J. Ray,et al.  Methylated Cytosines Mutate to Transcription Factor Binding Sites that Drive Tetrapod Evolution , 2015, Genome biology and evolution.

[13]  Eric Haugen,et al.  Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo , 2015, Nature Genetics.

[14]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[15]  Mihai Albu,et al.  C2H2 zinc finger proteins greatly expand the human regulatory lexicon , 2015, Nature Biotechnology.

[16]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[17]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[18]  Fidencio J. Neri,et al.  Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution , 2014, Science.

[19]  Elhanan Borenstein,et al.  Conservation of trans-acting circuitry during mammalian regulatory evolution , 2014, Nature.

[20]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[21]  Joshua L. Payne,et al.  The Robustness and Evolvability of Transcription Factor Binding Sites , 2014, Science.

[22]  R. Sandstrom,et al.  Probing DNA shape and methylation state on a genomic scale with DNase I , 2013, Proceedings of the National Academy of Sciences.

[23]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[24]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[25]  Shane J. Neph,et al.  Circuitry and Dynamics of Human Transcription Factor Regulatory Networks , 2012, Cell.

[26]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[27]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[28]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[29]  P. Sadowski,et al.  The CTCF insulator protein forms an unusual DNA structure , 2010, BMC Molecular Biology.

[30]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[31]  L. Mirny,et al.  Nucleosome-mediated cooperativity between transcription factors , 2009, Proceedings of the National Academy of Sciences.

[32]  C. E. Pearson,et al.  Table S2: Trans-factors and trinucleotide repeat instability Trans-factor , 2010 .

[33]  L. Mirny,et al.  Different gene regulation strategies revealed by analysis of binding motifs. , 2009, Trends in genetics : TIG.

[34]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[35]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[36]  M. Vingron,et al.  Methylation and deamination of CpGs generate p53-binding sites on a genomic scale. , 2009, Trends in genetics : TIG.

[37]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[38]  S. Harrison,et al.  An Atomic Model of the Interferon-β Enhanceosome , 2007, Cell.

[39]  J. Gerhart,et al.  The theory of facilitated variation , 2007, Proceedings of the National Academy of Sciences.

[40]  S. Carroll,et al.  Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene , 2006, Nature.

[41]  J A Epstein,et al.  Crystal structure of the human Pax6 paired domain-DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding. , 1999, Genes & development.

[42]  M. Burcin,et al.  DNA bending by the silencer protein NeP1 is modulated by TR and RXR. , 1996, Nucleic acids research.

[43]  R. Chalkley,et al.  Analysis of the competition between nucleosome formation and transcription factor binding. , 1994, The Journal of biological chemistry.

[44]  R. Tjian,et al.  The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter , 1983, Cell.

[45]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.