Genome-wide binding analysis of 195 DNA binding proteins reveals “reservoir” promoters and human specific SVA-repeat family regulation

A key aspect in defining cell state is the complex choreography of DNA binding events in a given cell type, which in turn establishes a cell-specific gene-expression program. In the past two decades since the sequencing of the human genome there has been a deluge of genome-wide experiments which have measured gene-expression and DNA binding events across numerous cell-types and tissues. Here we re-analyze ENCODE data in a highly reproducible manner by utilizing standardized analysis pipelines, containerization, and literate programming with Rmarkdown. Our approach validated many findings from previous independent studies, underscoring the importance of ENCODE’s goals in providing these reproducible data resources. This approach also revealed several new findings: (i) 1,362 promoters, termed ‘reservoirs,’ have up to 111 different DNA binding-proteins localized on one promoter yet do not have any expression of steady-state RNA (ii) The human specific SVA repeat element may have been co-opted for enhancer regulation. Collectively, this study performed by the students of a CU Boulder computational biology class (BCHM 5631 – Spring 2020) demonstrates the value of reproducible findings and how resources like ENCODE that prioritize data standards can foster new findings with existing data in a didactic environment.

[1]  Ryuichiro Nakato,et al.  Methods for ChIP-seq analysis: A practical workflow and advanced applications. , 2020, Methods.

[2]  Sven Nahnsen,et al.  The nf-core framework for community-curated bioinformatics pipelines , 2020, Nature Biotechnology.

[3]  R. Dowell,et al.  Lessons from eRNAs: understanding transcriptional regulation through the lens of nascent RNAs , 2019, Transcription.

[4]  C. Feschotte,et al.  Host–transposon interactions: conflict, cooperation, and cooption , 2019, Genes & development.

[5]  Leighton J. Core,et al.  Promoter-proximal pausing of RNA polymerase II: a nexus of gene regulation , 2019, Genes & development.

[6]  Jian Zhang,et al.  SEdb: a comprehensive human super-enhancer database , 2018, Nucleic Acids Res..

[7]  J. Rinn,et al.  High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue specificity , 2018, bioRxiv.

[8]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[9]  Gordon K Smyth,et al.  The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads , 2018, bioRxiv.

[10]  T. Read,et al.  Enhancer RNA profiling predicts transcription factor activity , 2018, Genome research.

[11]  Christopher A. Lavender,et al.  Widespread transcriptional pausing and elongation control at enhancers , 2018, Genes & development.

[12]  Manolis Kellis,et al.  Chromatin-state discovery and genome annotation with ChromHMM , 2017, Nature Protocols.

[13]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[14]  J. Rinn,et al.  Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs , 2016, bioRxiv.

[15]  J. Rinn,et al.  "Cat's Cradling" the 3D Genome by the Act of LncRNA Transcription. , 2016, Molecular cell.

[16]  C. Feschotte,et al.  Regulatory evolution of innate immunity through co-option of endogenous retroviruses , 2016, Science.

[17]  J. Rinn,et al.  Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution , 2015, Genome Biology.

[18]  Bing Ren,et al.  CRISPR Reveals a Distal Super-Enhancer Required for Sox2 Expression in Mouse Embryonic Stem Cells , 2014, PloS one.

[19]  Mitchell Guttman,et al.  RNA and dynamic nuclear organization , 2014, Science.

[20]  David K. Gifford,et al.  An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding , 2014, RECOMB.

[21]  R. Young,et al.  Super-Enhancers in the Control of Cell Identity and Disease , 2013, Cell.

[22]  Stephen C. J. Parker,et al.  Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants , 2013, Proceedings of the National Academy of Sciences.

[23]  G. Breen,et al.  Characterisation of the potential function of SVA retrotransposons to modulate gene expression patterns , 2013, BMC Evolutionary Biology.

[24]  David A. Orlando,et al.  Master Transcription Factors and Mediator Establish Super-Enhancers at Key Cell Identity Genes , 2013, Cell.

[25]  Zev N. Kronenberg,et al.  Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs , 2013, PLoS genetics.

[26]  J. Wysocka,et al.  Modification of enhancer chromatin: what, how, and why? , 2013, Molecular cell.

[27]  David R. Kelley,et al.  Transposable elements reveal a stem cell-specific class of long noncoding RNAs , 2012, Genome Biology.

[28]  T. Furey ChIP – seq and beyond : new and improved methodologies to detect and characterize protein – DNA interactions , 2012 .

[29]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[30]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[31]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[32]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[33]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[34]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[35]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[36]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[37]  Leighton J. Core,et al.  Paused Pol II captures enhancer activity and acts as a potent insulator. , 2009, Genes & development.

[38]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[39]  Jerilyn A. Walker,et al.  SVA elements: a hominid-specific retroposon family. , 2005, Journal of molecular biology.

[40]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[41]  E. Ostertag,et al.  SVA elements are nonautonomous retrotransposons that cause disease in humans. , 2003, American journal of human genetics.

[42]  Michael Q. Zhang,et al.  Use of Chromatin Immunoprecipitation To Clone Novel E2F Target Promoters , 2001, Molecular and Cellular Biology.

[43]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[44]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[45]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[46]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[47]  Nancy Kleckner,et al.  Cohesins Bind to Preferential Sites along Yeast Chromosome III, with Differential Regulation along Arms versus the Centric Region , 1999, Cell.

[48]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.