Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements.

[1]  Yonina C. Eldar,et al.  A fast and flexible method for the segmentation of aCGH data , 2008, ECCB.

[2]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[3]  N. Proudfoot,et al.  Definition of transcriptional promoters in the human beta globin locus control region. , 2002, Journal of molecular biology.

[4]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[5]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[6]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[7]  T. Derrien,et al.  Long Noncoding RNAs with Enhancer-like Function in Human Cells , 2010, Cell.

[8]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[9]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[10]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[11]  Howard Y. Chang,et al.  Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis , 2010, Nature.

[12]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[13]  M. Johnston,et al.  The Paf1 complex is required for histone H3 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. , 2003, Molecular cell.

[14]  S. Teichmann,et al.  RNA sequencing reveals two major classes of gene expression levels in metazoan cells , 2011, Molecular systems biology.

[15]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[16]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[17]  Leighton J. Core,et al.  A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells , 2011, Cell.

[18]  Ramanjulu Sunkar,et al.  Genome-wide identification and analysis of small RNAs originated from natural antisense transcripts in Oryza sativa. , 2008, Genome research.

[19]  Svetlana A. Shabalina,et al.  Negative Correlation between Expression Level and Evolutionary Rate of Long Intergenic Noncoding RNAs , 2011, Genome biology and evolution.

[20]  Alan M. Moses,et al.  In vivo enhancer analysis of human conserved non-coding sequences , 2006, Nature.

[21]  Sarah C. R. Elgin,et al.  The chromatin structure of specific genes: I. Evidence for higher order domains of defined DNA sequence , 1979, Cell.

[22]  Gabriel Kreiman,et al.  Conservation of transcription factor binding events predicts gene expression across species , 2011, Nucleic acids research.

[23]  Kenneth Evans,et al.  A comparative study of S/MAR prediction tools , 2007, BMC Bioinformatics.

[24]  Casey M. Bergman,et al.  Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster , 2005, Bioinform..

[25]  J. Rinn,et al.  A Large Intergenic Noncoding RNA Induced by p53 Mediates Global Gene Repression in the p53 Response , 2010, Cell.

[26]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[27]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[28]  Gautier Koscielny,et al.  Ensembl’s 10th year , 2009, Nucleic Acids Res..

[29]  Anthony P. Fejes,et al.  Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. , 2008, Genome research.

[30]  Howard Y. Chang,et al.  A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression , 2011, Nature.

[31]  Eugene V Koonin,et al.  A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. , 2003, Trends in genetics : TIG.

[32]  Gene W. Yeo,et al.  Divergent Transcription from Active Promoters , 2008, Science.

[33]  Hong Qian,et al.  Chromatin looping and the probability of transcription. , 2006, Trends in genetics : TIG.

[34]  L. Steinmetz,et al.  Bidirectional promoters generate pervasive transcription in yeast , 2009, Nature.

[35]  N. Perrimon,et al.  An endogenous small interfering RNA pathway in Drosophila , 2008, Nature.

[36]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[37]  H. Stunnenberg,et al.  Histone modification patterns associated with the human X chromosome , 2006, EMBO reports.

[38]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[39]  J. Ragoussis,et al.  A Large Fraction of Extragenic RNA Pol II Transcription Sites Overlap Enhancers , 2010, PLoS biology.

[40]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[41]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[42]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[43]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[44]  K. Kinzler,et al.  The Antisense Transcriptomes of Human Cells , 2008, Science.

[45]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[46]  Colin N. Dewey,et al.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. , 2007, Genome research.

[47]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[48]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[49]  B. Hall,et al.  Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest , 2006, Biological reviews of the Cambridge Philosophical Society.

[50]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[51]  K. Struhl Transcriptional noise and the fidelity of initiation by RNA polymerase II , 2007, Nature Structural &Molecular Biology.

[52]  A. Dean On a chromosome far, far away: LCRs and gene expression. , 2006, Trends in genetics : TIG.

[53]  Kevin Struhl,et al.  Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. , 2003, Molecular cell.

[54]  Christophe Malabat,et al.  Widespread bidirectional promoters are the major source of cryptic transcripts in yeast , 2009, Nature.

[55]  G. Kreiman,et al.  Widespread transcription at neuronal activity-regulated enhancers , 2010, Nature.

[56]  J. Zeitlinger,et al.  High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species , 2011, Nature Genetics.

[57]  C. Ponting,et al.  Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. , 2007, Genome research.

[58]  J. T. Kadonaga,et al.  *To whom correspondence should be addressed. E- , 2022 .

[59]  C. Glass,et al.  Reprogramming Transcription via Distinct Classes of Enhancers Functionally Defined by eRNA , 2011, Nature.

[60]  Cole Trapnell,et al.  Targeted RNA sequencing reveals the deep complexity of the human transcriptome , 2011, Nature Biotechnology.

[61]  J. Einasto Dark Matter , 2009, 0901.0632.

[62]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[63]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[64]  Y. Sakaki,et al.  Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes , 2008, Nature.

[65]  Michael Q. Zhang,et al.  Poly A- Transcripts Expressed in HeLa Cells , 2008, PloS one.

[66]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[67]  Leighton J. Core,et al.  Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters , 2008, Science.