A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity

Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R2 of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.

[1]  J. Banerji,et al.  Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. , 1981, Cell.

[2]  J. Banerji,et al.  Expression of a β-globin gene is enhanced by remote SV40 DNA sequences , 1981, Cell.

[3]  P. Chambon,et al.  The SV40 72 base repair repeat has a striking effect on gene expression both in SV40 and other chimeric recombinants. , 1981, Nucleic acids research.

[4]  D. Brenner,et al.  Transient induction of C‐jun during hepatic regeneration , 1990, Hepatology.

[5]  J. Bode,et al.  Scaffold-attached regions from the human interferon beta domain can be used to enhance the stable expression of genes under the control of various promoters. , 1991, Biochemistry.

[6]  Tsonwin Hai,et al.  Cross-family dimerization of transcription factors Fos/Jun and ATF/CREB alters DNA binding specificity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Zengo Furukawa,et al.  A General Framework for , 1991 .

[8]  G. Hager,et al.  Transcription factor loading on the MMTV promoter: a bimodal mechanism for promoter activation. , 1992, Science.

[9]  E. Wagner,et al.  c-Jun is essential for normal mouse development and hepatogenesis , 1993, Nature.

[10]  A. Stein,et al.  Micrococcal nuclease digestion of nuclei reveals extended nucleosome ladders having anomalous DNA lengths for chromatin assembled on non-replicating plasmids in transfected cells. , 1994, Nucleic acids research.

[11]  A. Bassuk,et al.  A direct physical association between ETS and AP-1 transcription factors in normal human T cells. , 1995, Immunity.

[12]  H. Varmus,et al.  Human immunodeficiency virus type 1 integrase mutants retain in vitro integrase activity yet fail to integrate viral DNA efficiently during infection , 1996, Journal of virology.

[13]  G. Hager,et al.  Transcriptional Regulation of Mammalian Genes in Vivo , 1997, The Journal of Biological Chemistry.

[14]  Frederic D. Bushman,et al.  A quantitative assay for HIV DNA integration in vivo , 2001, Nature Medicine.

[15]  S. Duncan,et al.  HNF4: A central regulator of hepatocyte differentiation and function , 2003, Hepatology.

[16]  A. Otte,et al.  Identification of anti-repressor elements that confer high and stable protein production in mammalian cells , 2003, Nature Biotechnology.

[17]  P. Angel,et al.  AP-1 subunits: quarrel and harmony among siblings , 2004, Journal of Cell Science.

[18]  P. Stern,et al.  In vivo RNA interference demonstrates a role for Nramp1 in modifying susceptibility to type 1 diabetes , 2006, Nature Genetics.

[19]  Xiao-Jin Yu,et al.  Transient gene expression by nonintegrating lentiviral vectors. , 2006, Molecular therapy : the journal of the American Society of Gene Therapy.

[20]  T. Archer,et al.  Chromatin-dependent Cooperativity between Site-specific Transcription Factors in Vivo* , 2007, Journal of Biological Chemistry.

[21]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[22]  T. Archer,et al.  Altered Histone H1 Stoichiometry and an Absence of Nucleosome Positioning on Transfected DNA* , 2008, Journal of Biological Chemistry.

[23]  Minghui Jiang,et al.  uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts , 2008, BMC Bioinformatics.

[24]  Clifford A. Meyer,et al.  FoxA1 Translates Epigenetic Signatures into Enhancer-Driven Lineage-Specific Transcription , 2008, Cell.

[25]  D. N. Levy,et al.  Viral complementation allows HIV-1 replication without integration , 2008, Retrovirology.

[26]  T. Kafri,et al.  Epigenetic activation of unintegrated HIV-1 genomes by gut-associated short chain fatty acids and its implications for HIV infection , 2009, Proceedings of the National Academy of Sciences.

[27]  Xiaoyin Wang,et al.  Lentivirus production. , 2009, Journal of visualized experiments : JoVE.

[28]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[29]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[30]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[31]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[32]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[33]  J. Kinney,et al.  Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence , 2010, Proceedings of the National Academy of Sciences.

[34]  C. H. Leung,et al.  Recognition and suppression of transfected plasmids by protein ZNF511-PRAP1, a potential molecular barrier to transgene expression. , 2011, Molecular therapy : the journal of the American Society of Gene Therapy.

[35]  Junichi Tsukada,et al.  The CCAAT/enhancer (C/EBP) family of basic-leucine zipper (bZIP) transcription factors is a multifaceted highly-regulated system for gene regulation. , 2011, Cytokine.

[36]  Raymond K. Auerbach,et al.  Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches , 2011, PLoS genetics.

[37]  Mihai Pop,et al.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes , 2011, BMC Bioinformatics.

[38]  Barak A. Cohen,et al.  Complex effects of nucleotide variants in a mammalian cis-regulatory element , 2012, Proceedings of the National Academy of Sciences.

[39]  Julie B. Ealy,et al.  Alternative nucleophilic substrates for the endonuclease activities of human immunodeficiency virus type 1 integrase. , 2012, Virology.

[40]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[41]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[42]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[43]  Nathan C. Sheffield,et al.  chromatin specific gene expression from regions of open − Predicting cell-type Material Supplemental , 2012 .

[44]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[45]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[46]  Michael D. Wilson,et al.  Cohesin regulates tissue-specific expression by stabilizing highly occupied cis-regulatory modules , 2012, Genome research.

[47]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[48]  J. Shendure,et al.  Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model , 2013, Nature Genetics.

[49]  Rama Kadamb,et al.  Sin3: insight into its transcription regulatory functions. , 2013, European journal of cell biology.

[50]  H. Harashima,et al.  Enhanced transgene expression from chromatinized plasmid DNA in mouse liver. , 2013, International journal of pharmaceutics.

[51]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[52]  O. Delelis,et al.  Quantitative analysis of the time-course of viral DNA forms during the HIV-1 life cycle , 2013, Retrovirology.

[53]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[54]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[55]  A. Stark,et al.  Transcriptional enhancers: from properties to genome-wide predictions , 2014, Nature Reviews Genetics.

[56]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[57]  Katherine S. Pollard,et al.  Integrating Diverse Datasets Improves Developmental Enhancer Prediction , 2013, PLoS Comput. Biol..

[58]  Richard Bonneau,et al.  FIREWACh: High-throughput Functional Detection of Transcriptional Regulatory Modules in Mammalian Cells , 2014, Nature Methods.

[59]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[60]  Eran Segal,et al.  Probing the effect of promoters on noise in gene expression using thousands of designed sequences , 2014, Genome research.

[61]  Cosmas D. Arnold,et al.  Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution , 2014, Nature Genetics.

[62]  Richard M Myers,et al.  Promoter-distal RNA polymerase II binding discriminates active from inactive CCAAT/ enhancer-binding protein beta binding sites , 2015, Genome research.

[63]  P. Flicek,et al.  The Ensembl Regulatory Build , 2015, Genome Biology.

[64]  Vidya Subramanian,et al.  H2A.Z: a molecular rheostat for transcriptional control , 2015, F1000prime reports.

[65]  Michael A. White Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences. , 2015, Genomics.

[66]  William H. Majoros,et al.  Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort , 2015, Genome research.

[67]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[68]  David Baker,et al.  Multiplex pairwise assembly of array-derived DNA oligonucleotides , 2015, Nucleic acids research.

[69]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[70]  O. Delelis,et al.  Opposite transcriptional regulation of integrated vs unintegrated HIV genomes by the NF-κB pathway , 2016, Scientific Reports.

[71]  John G Flannery,et al.  Massively parallel cis-regulatory analysis in the mammalian central nervous system , 2016, Genome research.

[72]  Dongwon Lee,et al.  LS-GKM: a new gkm-SVM for large-scale datasets , 2016, Bioinform..