The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  E. Dees,et al.  The product of the H19 gene may function as an RNA , 1990, Molecular and cellular biology.

[3]  Shin Heu,et al.  Experimental Validation of , 1991 .

[4]  Dominic P. Norris,et al.  The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus , 1992, Cell.

[5]  P H Watson,et al.  The steroid receptor RNA activator is the first functional RNA encoding a protein , 2004, FEBS letters.

[6]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[7]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[8]  E. Liu,et al.  Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation , 2005, Nature Methods.

[9]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[10]  S. Batalov,et al.  A Strategy for Probing the Function of Noncoding RNAs Finds a Repressor of NFAT , 2005, Science.

[11]  J. Mattick,et al.  Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. , 2005, Genome research.

[12]  C. Bult,et al.  Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs , 2006, PLoS genetics.

[13]  Yusuke Nakamura,et al.  Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction , 2006, Journal of Human Genetics.

[14]  C. Ponting,et al.  Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. , 2007, Genome research.

[15]  Jing Zhao,et al.  Activation of p53 by MEG3 Non-coding RNA* , 2007, Journal of Biological Chemistry.

[16]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[17]  Charlotte N. Henrichsen,et al.  Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. , 2007, Genome research.

[18]  P. Stadler,et al.  RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription , 2007, Science.

[19]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[20]  Thomas R Gingeras,et al.  Origin of phenotypes: genes and transcripts. , 2007, Genome research.

[21]  Enrique Blanco,et al.  Using geneid to Identify Genes , 2002, Current protocols in bioinformatics.

[22]  Tyler S. Alioto,et al.  U12DB: a database of orthologous U12-type spliceosomal introns , 2006, Nucleic Acids Res..

[23]  David L. Spector,et al.  3′ End Processing of a Long Nuclear-Retained Noncoding RNA Yields a tRNA-like Cytoplasmic RNA , 2008, Cell.

[24]  Jennifer A. Mitchell,et al.  The Air Noncoding RNA Epigenetically Silences Transcription by Targeting G9a to Chromatin , 2008, Science.

[25]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[26]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[27]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[28]  P. Khaitovich,et al.  BMC Genomics BioMed Central Methodology article Estimating accuracy of RNA-Seq and microarrays with proteomics , 2022 .

[29]  C. Ponting,et al.  Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness , 2009, Genome Biology.

[30]  Paulo P. Amaral,et al.  MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. , 2009, Genome research.

[31]  J. Rinn,et al.  Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression , 2009, Proceedings of the National Academy of Sciences.

[32]  Tim R. Mercer,et al.  NRED: a database of long noncoding RNA expression , 2008, Nucleic Acids Res..

[33]  J. Mattick The Genetic Signatures of Noncoding RNAs , 2009, PLoS genetics.

[34]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[35]  C. Ponting,et al.  Evolution and Functions of Long Noncoding RNAs , 2009, Cell.

[36]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[37]  P. Sorensen,et al.  The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA , 2010, BMC Biology.

[38]  Shuji Nakamura,et al.  A thymus-specific noncoding RNA, Thy-ncR1, is a cytoplasmic riboregulator of MFAP4 mRNA in immature T-cell lines , 2010, BMC Molecular Biology.

[39]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[40]  G. Chrousos,et al.  Noncoding RNA Gas5 Is a Growth Arrest– and Starvation-Associated Repressor of the Glucocorticoid Receptor , 2010, Science Signaling.

[41]  Leonard Lipovich,et al.  Genome-wide computational identification and manual annotation of human long noncoding RNA genes. , 2010, RNA.

[42]  P. Stadler,et al.  A novel family of plasmid-transferred anti-sense ncRNAs , 2010, RNA biology.

[43]  Gaurav Kumar Pandey,et al.  Characterization of the RNA content of chromatin. , 2010, Genome research.

[44]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[45]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[46]  T. Derrien,et al.  Long Noncoding RNAs with Enhancer-like Function in Human Cells , 2010, Cell.

[47]  G. Kreiman,et al.  Widespread transcription at neuronal activity-regulated enhancers , 2010, Nature.

[48]  S Kobayashi,et al.  Small Peptides Switch the Transcriptional Activity of Shavenbaby During Drosophila Embryogenesis , 2010, Science.

[49]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[50]  J. Mattick,et al.  SNORD-host RNA Zfas1 is a regulator of mammary development and a potential marker for breast cancer. , 2011, RNA.

[51]  J. Einasto Dark Matter , 2009, 0901.0632.

[52]  John S. Mattick,et al.  lncRNAdb: a reference database for long noncoding RNAs , 2010, Nucleic Acids Res..

[53]  I. Bièche,et al.  ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS , 2011, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[54]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[55]  Dennis K. Gascoigne,et al.  The evolution of RNAs with multiple functions. , 2011, Biochimie.

[56]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[57]  L. Maquat,et al.  lncRNAs transactivate Staufen1-mediated mRNA decay by duplexing with 3'UTRs via Alu elements , 2010, Nature.

[58]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[59]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[60]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[61]  James B. Brown,et al.  Long noncoding RNAs are rarely translated in two human cell lines , 2012, Genome research.

[62]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.