Index and biological spectrum of accessible DNA elements in the human genome

DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA and harbor disease- and phenotypic trait-associated genetic variation. We established high-precision maps of DNase I hypersensitive sites from 733 human biosamples encompassing 439 cell and tissue types and states, and integrated these to precisely delineate and numerically index ~3.6 million DHSs encoded within the human genome, providing a common coordinate system for regulatory DNA. Here we show that the expansive scale of cell and tissue states sampled exposes an unprecedented degree of stereotyped actuation of large sets of elements, signaling the operation of distinct genome-scale regulatory programs. We show further that the complex actuation patterns of individual elements can be captured comprehensively by a simple regulatory vocabulary reflecting their dominant cellular manifestation. This vocabulary, in turn, enables comprehensive and quantitative regulatory annotation of both protein-coding genes and the vast array of well-defined but poorly-characterized non-coding RNA genes. Finally, we show that the combination of high-precision DHSs and regulatory vocabularies markedly concentrate disease- and trait-associated non-coding genetic signals both along the genome and across cellular compartments. Taken together, our results provide a common and extensible coordinate system and vocabulary for human regulatory DNA, and a new global perspective on the architecture of human gene regulation.

[1]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[2]  G. Bourque,et al.  Impact of using a personalized genome on histone ChIP-seq peak calls , 2018, bioRxiv.

[3]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[4]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[5]  Alexander V. Favorov,et al.  Enter the Matrix: Factorization Uncovers Knowledge from Omics , 2018, Trends in genetics : TIG.

[6]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[7]  Wan-Ping Lee,et al.  Fast and accurate genomic analyses using genome graphs , 2019, Nature Genetics.

[8]  Ryan P. Adams,et al.  Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk , 2017, bioRxiv.

[9]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[10]  Timothy E. Reddy,et al.  CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome , 2017, Nature Biotechnology.

[11]  D. Trono,et al.  KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks , 2017, Nature.

[12]  Evan Z. Macosko,et al.  Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types , 2017, Nature Genetics.

[13]  F. Naya,et al.  The Function of the MEF2 Family of Transcription Factors in Cardiac Development, Cardiogenomics, and Direct Reprogramming , 2016, Journal of cardiovascular development and disease.

[14]  Jeff Vierstra,et al.  Genomic footprinting , 2016, Nature Methods.

[15]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[16]  Fidencio J. Neri,et al.  Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution , 2014, Science.

[17]  Elhanan Borenstein,et al.  Conservation of trans-acting circuitry during mammalian regulatory evolution , 2014, Nature.

[18]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[19]  M. Morishima,et al.  The role of Foxc2 gene in lung development , 2014 .

[20]  J. Workman,et al.  Nucleosome remodeling and epigenetics. , 2013, Cold Spring Harbor perspectives in biology.

[21]  Shane J. Neph,et al.  Developmental Fate and Cellular Maturity Encoded in Human Regulatory DNA Landscapes , 2013, Cell.

[22]  G. Bourque,et al.  The Majority of Primate-Specific Regulatory Sequences Are Derived from Transposable Elements , 2013, PLoS genetics.

[23]  Hideo Negishi,et al.  The IRF family of transcription factors , 2012, Oncoimmunology.

[24]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[25]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[26]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[27]  M. Gerstein,et al.  The GENCODE pseudogene resource , 2012, Genome Biology.

[28]  Stephen C. J. Parker,et al.  Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection , 2012, PLoS genetics.

[29]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[30]  J. Sled,et al.  Effects of Reduced Gcm1 Expression on Trophoblast Morphology, Fetoplacental Vascularity, and Pregnancy Outcomes in Mice , 2012, Hypertension.

[31]  Cole Trapnell,et al.  Targeted RNA sequencing reveals the deep complexity of the human transcriptome , 2011, Nature Biotechnology.

[32]  Nathan C. Sheffield,et al.  Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. , 2011, Genome research.

[33]  D. Schübeler,et al.  Determinants and dynamics of genome accessibility , 2011, Nature Reviews Genetics.

[34]  P. Scacheri,et al.  Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. , 2011, Genome research.

[35]  Berthold Göttgens,et al.  Maps of Open Chromatin Guide the Functional Follow-Up of Genome-Wide Association Signals: Application to Hematological Traits , 2011, PLoS genetics.

[36]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[37]  G. Dressler Patterning and early cell lineage decisions in the developing kidney: the role of Pax genes , 2011, Pediatric Nephrology.

[38]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[39]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[40]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[41]  Petra C. Schwalie,et al.  A CTCF-independent role for cohesin in tissue-specific transcription. , 2010, Genome research.

[42]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[43]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[44]  Cooper Sj,et al.  Ultraviolet B Regulation of Transcription Factor Families: Roles of Nuclear Factor-kappa B (NF-κB) and Activator Protein-1 (AP-1) in UVB-Induced Skin Carcinogenesis , 2007 .

[45]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[46]  S. Cooper,et al.  Ultraviolet B regulation of transcription factor families: roles of nuclear factor-kappa B (NF-kappaB) and activator protein-1 (AP-1) in UVB-induced skin carcinogenesis. , 2007, Current cancer drug targets.

[47]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[48]  J. Stamatoyannopoulos,et al.  Discovery of functional noncoding elements by digital analysis of chromatin structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[49]  T. Su,et al.  Decreased placental GCM1 (glial cells missing) gene expression in pre-eclampsia. , 2004, Placenta.

[50]  Alice Young,et al.  Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[51]  A. Mills,et al.  p63 is the molecular switch for initiation of an epithelial stratification program. , 2004, Genes & development.

[52]  Xiangdong Fang,et al.  Locus control regions. , 2002, Blood.

[53]  R Ohlsson,et al.  CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. , 2001, Trends in genetics : TIG.

[54]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[55]  G. Felsenfeld,et al.  A 5′ element of the chicken β-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila , 1993, Cell.

[56]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[57]  F. Mills,et al.  DNase I hypersensitive sites in the chromatin of human μ immunoglobulin heavy-chain genes , 1983, Nature.

[58]  J. D. Engel,et al.  A 200 base pair region at the 5′ end of the chicken adult β-globin gene is accessible to nuclease digestion , 1981, Cell.

[59]  Carl Wu The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I , 1980, Nature.

[60]  J. D. Engel,et al.  Tissue-specific DNA cleavages in the globin chromatin domain introduced by DNAase I , 1980, Cell.

[61]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[62]  S. Elgin,et al.  INCREASED NUCLEASE-SUSCEPTIBILITY OF ACTIVE CHROMATIN AT HEAT-SHOCK LOCI IN DROSOPHILA , 1978 .