Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo

The function of human regulatory regions depends exquisitely on their local genomic environment and on cellular context, complicating experimental analysis of common disease- and trait-associated variants that localize within regulatory DNA. We use allelically resolved genomic DNase I footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly influence transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enables systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo. We leverage this analysis to develop accurate models of variation affecting the recognition sites for diverse transcription factors and apply these models to discriminate nearly 500,000 common regulatory variants likely to affect transcription factor occupancy across the human genome. The approach and results provide a new foundation for the analysis and interpretation of noncoding variation in complete human genomes and for systems-level investigation of disease-associated variants.

[1]  Matthew T. Maurano,et al.  Widespread Site-Dependent Buffering of Human Regulatory Polymorphism , 2012, PLoS genetics.

[2]  K. Yamamoto,et al.  DNA Binding Site Sequence Directs Glucocorticoid Receptor Structure and Activity , 2009, Science.

[3]  Shane J. Neph,et al.  Developmental Fate and Cellular Maturity Encoded in Human Regulatory DNA Landscapes , 2013, Cell.

[4]  Vishwanath R Iyer,et al.  Simultaneous SNP identification and assessment of allele-specific bias from ChIP-seq data , 2012, BMC Genetics.

[5]  M. Stephens,et al.  High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation , 2008, PLoS genetics.

[6]  Myong-Hee Sung,et al.  Transcription factor AP1 potentiates chromatin accessibility and glucocorticoid receptor binding. , 2011, Molecular cell.

[7]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[8]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[9]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[10]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[11]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[12]  Nicolas Le Novère,et al.  MELTING, computing the melting temperature of nucleic acid duplex. , 2001, Bioinformatics.

[13]  Fidencio J. Neri,et al.  Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution , 2014, Science.

[14]  Nicolas Le Nov MELTING, computing the melting temperature of nucleic acid duplex , 2001 .

[15]  Germ-line transformation of mice. , 1986, Annual review of genetics.

[16]  R. Sandstrom,et al.  Probing DNA shape and methylation state on a genomic scale with DNase I , 2013, Proceedings of the National Academy of Sciences.

[17]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[18]  S. Batzoglou,et al.  Characterization of evolutionary rates and constraints in three Mammalian genomes. , 2004, Genome research.

[19]  Steven Henikoff,et al.  Histone modification: cause or cog? , 2011, Trends in genetics : TIG.

[20]  Alex P. Reynolds,et al.  Genome-scale mapping of DNase I hypersensitivity. , 2013, Current protocols in molecular biology.

[21]  G. Stormo,et al.  Quantitative analysis demonstrates most transcription factors require only simple models of specificity , 2011, Nature Biotechnology.

[22]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[23]  J. Dekker,et al.  The long-range interaction landscape of gene promoters , 2012, Nature.

[24]  T. Maniatis,et al.  Virus induction of human IFNβ gene expression requires the assembly of an enhanceosome , 1995, Cell.

[25]  Matthew T. Maurano,et al.  Role of DNA Methylation in Modulating Transcription Factor Occupancy. , 2015, Cell reports.

[26]  Leighton J. Core,et al.  Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription , 2013, Science.

[27]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[28]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[29]  Wei Chen,et al.  Gene Expression in Skin and Lymphoblastoid Cells: Refined Statistical Method Reveals Extensive Overlap in Cis-eqtl Signals , 2022 .

[30]  Timothy E. Reddy,et al.  Effects of sequence variation on differential allelic transcription factor occupancy and gene expression , 2012, Genome research.

[31]  Richard S. Sandstrom,et al.  BEDOPS: high-performance genomic feature operations , 2012, Bioinform..

[32]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[33]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[34]  G. Hager,et al.  Transcription factor loading on the MMTV promoter: a bimodal mechanism for promoter activation. , 1992, Science.

[35]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[36]  Jeff Vierstra,et al.  Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH , 2013, Nature Methods.

[37]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[38]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[39]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[40]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[41]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[42]  Alex P. Reynolds,et al.  Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution , 2013, Science.

[43]  J. Joung,et al.  Locus-specific editing of histone modifications at endogenous enhancers using programmable TALE-LSD1 fusions , 2013, Nature Biotechnology.

[44]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[45]  Jehyuk Lee,et al.  A Robust Approach to Identifying Tissue-Specific Gene Expression Regulatory Variants Using Personalized Human Induced Pluripotent Stem Cells , 2009, PLoS genetics.

[46]  Dominic P. Kwiatkowski,et al.  In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading , 2003, Nature Genetics.

[47]  Morgan C. Giddings,et al.  Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease , 2014, Proceedings of the National Academy of Sciences.

[48]  G. Stamatoyannopoulos,et al.  Role of gene order in developmental control of human gamma- and beta-globin gene expression , 1993, Molecular and cellular biology.

[49]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[50]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[51]  R. Kingston,et al.  What does 'chromatin remodeling' mean? , 2000, Trends in biochemical sciences.

[52]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[53]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[54]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[55]  D. Clayton,et al.  Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing , 2009, Human molecular genetics.

[56]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[57]  Matthew T. Maurano,et al.  Widespread plasticity in CTCF occupancy linked to DNA methylation , 2012, Genome research.

[58]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[59]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[60]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[61]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[62]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[63]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[64]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[65]  L. Kruglyak,et al.  Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. , 2005, Genome research.

[66]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[67]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[68]  Gautier Koscielny,et al.  Analysis of variation at transcription factor binding sites in Drosophila and humans , 2012, Genome Biology.

[69]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[70]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.