A single-molecule long-read survey of the human transcriptome

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

[1]  R. Guigó,et al.  GeneID in Drosophila. , 2000, Genome research.

[2]  Eduardo Eyras,et al.  ESTGenes: alternative splicing from ESTs in Ensembl. , 2004, Genome research.

[3]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[4]  B. Frey,et al.  Functional coordination of alternative splicing in the mammalian central nervous system , 2007, Genome Biology.

[5]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[6]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[7]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[8]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[9]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[10]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[11]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[12]  T. Gingeras Missing lincs in the transcriptome , 2009, Nature Biotechnology.

[13]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[14]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[15]  Joseph K. Pickrell,et al.  Noisy Splicing Drives mRNA Isoform Diversity in Human Cells , 2010, PLoS genetics.

[16]  S. Turner,et al.  A flexible and efficient template format for circular consensus sequencing and SNP detection , 2010, Nucleic acids research.

[17]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[18]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[19]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[20]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[21]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[22]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[23]  W. Wong,et al.  Improving PacBio Long Read Accuracy by Short Read Alignment , 2012, PloS one.

[24]  Timothy Daley,et al.  Predicting the molecular complexity of sequencing libraries , 2013, Nature Methods.

[25]  Mark Gerstein,et al.  Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing , 2013, G3: Genes, Genomes, Genetics.

[26]  D. S. Gross,et al.  Chromatin , 2020, Definitions.