High-throughput functional testing of ENCODE segmentation predictions

The histone modification state of genomic regions is hypothesized to reflect the regulatory activity of the underlying genomic DNA. Based on this hypothesis, the ENCODE Project Consortium measured the status of multiple histone modifications across the genome in several cell types and used these data to segment the genome into regions with different predicted regulatory activities. We measured the cis-regulatory activity of more than 2000 of these predictions in the K562 leukemia cell line. We tested genomic segments predicted to be Enhancers, Weak Enhancers, or Repressed elements in K562 cells, along with other sequences predicted to be Enhancers specific to the H1 human embryonic stem cell line (H1-hESC). Both Enhancer and Weak Enhancer sequences in K562 cells were more active than negative controls, although surprisingly, Weak Enhancer segmentations drove expression higher than did Enhancer segmentations. Lower levels of the covalent histone modifications H3K36me3 and H3K27ac, thought to mark active enhancers and transcribed gene bodies, associate with higher expression and partly explain the higher activity of Weak Enhancers over Enhancer predictions. While DNase I hypersensitivity (HS) is a good predictor of active sequences in our assay, transcription factor (TF) binding models need to be included in order to accurately identify highly expressed sequences. Overall, our results show that a significant fraction (-26%) of the ENCODE enhancer predictions have regulatory activity, suggesting that histone modification states can reflect the cis-regulatory activity of sequences in the genome, but that specific sequence preferences, such as TF-binding sites, are the causal determinants of cis-regulatory activity.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  A. Nienhuis,et al.  Tandem AP-1-binding sites within the human beta-globin dominant control region function as an inducible enhancer in erythroid cells. , 1990, Genes & development.

[3]  C. Allis,et al.  The language of covalent histone modifications , 2000, Nature.

[4]  P. Angel,et al.  AP-1 subunits: quarrel and harmony among siblings , 2004, Journal of Cell Science.

[5]  Doree Sitkoff,et al.  models homology modeling : From sequence alignments to structural A comparative study of available software for high-accuracy , 2005 .

[6]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[7]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[8]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[9]  D. Skalnik,et al.  Identification of a minimal cis-element and cognate trans-factor(s) required for induction of Rac2 gene expression during K562 cell differentiation. , 2009, Gene.

[10]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[11]  Z. Weng,et al.  Sequence features that drive human promoter function and tissue specificity. , 2010, Genome research.

[12]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[13]  Emily M. LeProust,et al.  Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process , 2010, Nucleic acids research.

[14]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[15]  E. Segal,et al.  p53 binds preferentially to genomic regions with high DNA-encoded nucleosome occupancy. , 2010, Genome research.

[16]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[17]  Stephen C. J. Parker,et al.  A map of minor groove shape and electrostatic potential from hydroxyl radical cleavage patterns of DNA. , 2011, ACS chemical biology.

[18]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[19]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[20]  S. Dimitrov,et al.  Histone H3 trimethylation at lysine 36 is associated with constitutive and facultative heterochromatin. , 2011, Genome research.

[21]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[22]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[23]  Barak A. Cohen,et al.  Complex effects of nucleotide variants in a mammalian cis-regulatory element , 2012, Proceedings of the National Academy of Sciences.

[24]  Michael A. Beer,et al.  Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes , 2012, Genome research.

[25]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[26]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[27]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[28]  William Stafford Noble,et al.  Sequence and chromatin determinants of cell-type–specific transcription factor binding , 2012, Genome research.

[29]  T. Mikkelsen,et al.  Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay , 2012, Nature Biotechnology.

[30]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[31]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[32]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[33]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[34]  K. Struhl,et al.  Determinants of nucleosome positioning , 2013, Nature Structural &Molecular Biology.

[35]  B. Cohen,et al.  Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks , 2013, Proceedings of the National Academy of Sciences.

[36]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[37]  J. Shendure,et al.  Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model , 2013, Nature Genetics.

[38]  Jie Wang,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2013, BCB.

[39]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[40]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[41]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[42]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..