Next-generation transcriptome assembly

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches — reference-based, de novo and combined strategies — along with some perspectives on transcriptome assembly in the near future.

[1]  T. Grundström,et al.  Overlapping genes. , 1983, Annual review of genetics.

[2]  B. Schaefer,et al.  Revolutions in rapid amplification of cDNA ends: new strategies for polymerase chain reaction cloning of full-length cDNA ends. , 1995, Analytical biochemistry.

[3]  M. Tomita,et al.  Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. , 1999, Nucleic acids research.

[4]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.

[5]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[7]  S. Chisholm,et al.  Properties of overlapping genes are conserved across microbial genomes. , 2004, Genome research.

[8]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[9]  Izabela Makalowska,et al.  Overlapping genes in vertebrate genomes , 2005, Comput. Biol. Chem..

[10]  R. Veitia,et al.  Reverse transcriptase template switching and false alternative transcripts. , 2006, Genomics.

[11]  S. Dhanasekaran,et al.  Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer , 2007, Nature.

[12]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[13]  F. Denoeud,et al.  Annotating genomes with massive-scale RNA sequencing , 2008, Genome Biology.

[14]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[15]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[16]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[17]  J. Bähler,et al.  Cellular and Molecular Life Sciences REVIEW RNA-seq: from technology to biology , 2022 .

[18]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[19]  Samuel A. Assefa,et al.  A Strand-Specific RNA–Seq Analysis of the Transcriptome of the Typhoid Bacillus Salmonella Typhi , 2009, PLoS genetics.

[20]  M. Gonzalo Claros,et al.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read , 2010, BMC Bioinformatics.

[21]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[22]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[23]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[24]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[25]  Brian D. Ondov,et al.  Structure and Complexity of a Bacterial Transcriptome , 2009, Journal of bacteriology.

[26]  Carsten O. Daub,et al.  TagDust—a program to eliminate artifacts from next generation sequencing data , 2009, Bioinform..

[27]  B. Wilhelm,et al.  RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. , 2009, Methods.

[28]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[29]  J. Montoya-Burgos,et al.  Optimization of de novo transcriptome assembly from next-generation sequencing data. , 2010, Genome research.

[30]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[31]  Jeong-Hyeon Choi,et al.  A garter snake transcriptome: pyrosequencing, de novo assembly, and sex-specific differences , 2010, BMC Genomics.

[32]  Weiguo Liu,et al.  A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware , 2010, J. Comput. Biol..

[33]  Lira Mamanova,et al.  FRT-seq: Amplification-free, strand-specific, transcriptome sequencing , 2010, Nature Methods.

[34]  B. Wittner,et al.  Amplification-free digital gene expression profiling from minute cell quantities , 2010, Nature Methods.

[35]  G. Sherlock,et al.  Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads , 2010, BMC Genomics.

[36]  P. Kapranov,et al.  Comprehensive Polyadenylation Site Maps in Yeast and Human Reveal Pervasive Alternative Polyadenylation , 2010, Cell.

[37]  S. Tringe,et al.  Validation of two ribosomal RNA removal methods for microbial metatranscriptomics , 2010, Nature Methods.

[38]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[39]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[40]  Konrad H. Paszkiewicz,et al.  De novo assembly of short sequence reads , 2010, Briefings Bioinform..

[41]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[42]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[43]  Brian P. Lazzaro,et al.  De Novo Transcriptome Sequencing in Anopheles funestus Using Illumina RNA-Seq Technology , 2010, PloS one.

[44]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[45]  H. Kanamori,et al.  Massive parallel sequencing of mRNA in identification of unannotated salinity stress-inducible transcripts in rice (Oryza sativa L.) , 2010, BMC Genomics.

[46]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[47]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[48]  Shaun D Jackman,et al.  Assembling genomes using short-read sequencing technology , 2010, Genome Biology.

[49]  Albert J. Vilella,et al.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis , 2010, PLoS biology.

[50]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[51]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[52]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[53]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[54]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[55]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[56]  Steve Miller,et al.  A Metagenomic Analysis of Pandemic Influenza A (2009 H1N1) Infection in Patients from North America , 2010, PloS one.

[57]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[58]  Sallie W. Chisholm,et al.  Unlocking Short Read Sequencing for Metagenomics , 2010, PloS one.

[59]  C. Ponting,et al.  Genome assembly quality: assessment and improvement using the neutral indel model. , 2010, Genome research.

[60]  Le Kang,et al.  De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits , 2010, PloS one.

[61]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[62]  M. Wilkins,et al.  Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's Disease , 2011, PloS one.

[63]  Georg N Duda,et al.  Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing , 2011, BMC Genomics.

[64]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[65]  Lee T. Sam,et al.  A Comparison of Single Molecule and Amplification Based Sequencing of Cancer Transcriptomes , 2011, PloS one.

[66]  Zhoutao Chen,et al.  Ribosomal RNA depletion for massively parallel bacterial RNA-sequencing applications. , 2011, Methods in molecular biology.

[67]  Christoph Dieterich,et al.  De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. , 2011, Genome research.

[68]  Vineet Bafna,et al.  Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs , 2011, Bioinform..

[69]  J. Seidman,et al.  Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. , 2011, Current protocols in molecular biology.

[70]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[71]  Akhilesh K. Tyagi,et al.  De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.