Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

[1]  I. Good Normal Recurring Decimals , 1946 .

[2]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[3]  C. Hyltén-Cavallius On a combinatorical problem , 1958 .

[4]  C. Shimoda,et al.  The Schizosaccharomyces pombe spo6+ gene encoding a nuclear protein with sequence similarity to budding yeast Dbf4 is required for meiotic second division and sporulation , 2000, Genes to cells : devoted to molecular & cellular mechanisms.

[5]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[6]  Y. Hiraoka,et al.  Characterization of rec7, an early meiotic recombination gene in Schizosaccharomyces pombe. , 2001, Genetics.

[7]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  H. Nojima,et al.  Comprehensive isolation of meiosis-specific genes identifies novel proteins and unusual non-coding transcripts in Schizosaccharomyces pombe. , 2001, Nucleic acids research.

[9]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[10]  Matthew Berriman,et al.  GeneDB: a resource for prokaryotic and eukaryotic organisms , 2004, Nucleic Acids Res..

[11]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[12]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[13]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[14]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[15]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[16]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[17]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[18]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[19]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[20]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[21]  L. Steinmetz,et al.  Bidirectional promoters generate pervasive transcription in yeast , 2009, Nature.

[22]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[23]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[24]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[25]  T. Borodina,et al.  Transcriptome analysis by strand-specific sequencing of complementary DNA , 2009, Nucleic acids research.

[26]  Chuan-Xi Zhang,et al.  De novo characterization of a whitefly transcriptome and analysis of its gene expression during development , 2010, BMC Genomics.

[27]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[28]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[29]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[30]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[31]  Miriah D. Meyer,et al.  Genome-wide synteny through highly sensitive sequence alignment: Satsuma , 2010, Bioinform..

[32]  N. Friedman,et al.  Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species , 2010, Genome Biology.

[33]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[34]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[35]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[36]  Manolis Kellis,et al.  Comparative Functional Genomics of the Fission Yeasts , 2011, Science.