Advancing RNA-Seq analysis

Sequencing of RNA has long been recognized as an efficient method for gene discovery1 and remains the gold standard for annotation of both coding and noncoding genes2. Compared with earlier methods, massively parallel sequencing of RNA (RNA-Seq)3 has vastly increased the throughput of RNA sequencing and allowed global measurement of transcript abundance. Two reports in this issue introduce approaches for RNA-Seq analysis that capture genome-wide transcription and splicing in unprecedented detail. Trapnell et al.4 describe a software package, Cufflinks, for simultaneous discovery of transcripts and quantification of expression levels and apply it to study gene expression and splicing during the differentiation of mouse myoblast cells. Taking a similar approach, Guttman et al.5 use software called Scripture to reannotate the transcriptomes of three mouse cell lines, defining complete gene models for hundreds of new large intergenic noncoding RNAs (lincRNAs)6. Although transcript sequencing has been possible for nearly 20 years, until recently it required the construction of clone libraries. Projects to determine full-length gene structures for human, mouse and other important models have taken years to complete7. With new sequencing technologies, no cloning is needed, allowing direct sequencing of cDNA fragments. In a matter of days and at a small fraction of the cost of earlier projects, one can achieve reasonably complete coverage of a transcriptome8. But this approach has been hindered by a substantial challenge: without cloning, one cannot know a priori which reads came from which transcripts. Recent studies analyzed gene expression and alternative splicing by mapping short RNASeq reads to previously known or predicted transcripts9,10. Although highly informative, such studies are inherently limited to known genes and to alternative splicing across previously identified splice junctions. To fully leverage RNA-Seq data for biological discovery, one should be able to reconstruct transcripts and accurately measure their relative abundance without reference to an annotated genome. Previous efforts to reconstruct transcripts from short RNA-Seq reads have followed two general strategies (Fig. 1). The first, a de novo assembly approach implemented in the ABySS software11, reduces the annotation problem to that of aligning full-length cDNAs, which is well handled by several algorithms. This method is also applicable to the discovery of transcripts that are missing or incomplete in the reference genome and to RNA-Seq data from organisms lacking a genome reference. RNA-Seq reads

[1]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[2]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[3]  B. Haas,et al.  Full-length messenger RNA sequences greatly improve genome annotation , 2002, Genome Biology.

[4]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[5]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[6]  F. Denoeud,et al.  Annotating genomes with massive-scale RNA sequencing , 2008, Genome Biology.

[7]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[8]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[9]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[10]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[11]  Richard A. Moore,et al.  The Completion of the Mammalian Gene Collection (mgc) Recommended Citation , 2022 .

[12]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[13]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[14]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[15]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[16]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[17]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[18]  Bioinformatics : genome bioinformatics and computational biology , 2011 .