De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

[1]  W. David Kelton,et al.  Statistical design and analysis , 1986, WSC '86.

[2]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[3]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[4]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[5]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[6]  Alexie Papanicolaou,et al.  Next generation transcriptomes for next generation genomes using est2assembly , 2009, BMC Bioinformatics.

[7]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[8]  T. Borodina,et al.  Transcriptome analysis by strand-specific sequencing of complementary DNA , 2009, Nucleic acids research.

[9]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[10]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[11]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[12]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[13]  M. Blaxter,et al.  Comparing de novo assemblers for 454 transcriptome data , 2010, BMC Genomics.

[14]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[15]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[16]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[17]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[18]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[19]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[20]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[21]  Manolis Kellis,et al.  Comparative Functional Genomics of the Fission Yeasts , 2011, Science.

[22]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[23]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[24]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[25]  H. Long,et al.  De novo assembly and characterization of the root transcriptome of Aegilops variabilis during an interaction with the cereal cyst nematode , 2012, BMC Genomics.

[26]  Jeff H. Chang,et al.  GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences , 2011, PloS one.

[27]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[28]  Xiangqin Cui,et al.  Design and validation issues in RNA-seq experiments , 2011, Briefings Bioinform..

[29]  Mingyao Li,et al.  Widespread RNA and DNA Sequence Differences in the Human Transcriptome , 2011, Science.

[30]  T. Borodina,et al.  A strand-specific library preparation protocol for RNA sequencing. , 2011, Methods in enzymology.

[31]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[32]  Xuan Li,et al.  Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study , 2011, BMC Bioinformatics.

[33]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[34]  Bernard P. Puc,et al.  An integrated semiconductor device enabling non-optical genome sequencing , 2011, Nature.

[35]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[36]  Lin Liu,et al.  Comparison of Next-Generation Sequencing Systems , 2012, Journal of biomedicine & biotechnology.

[37]  Y. Saeys,et al.  GenomeView: a next-generation genome browser , 2011, Nucleic acids research.

[38]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[39]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[40]  Jialei Duan,et al.  Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data , 2012, BMC Genomics.

[41]  Jin Billy Li,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[42]  Tim H. Brom,et al.  A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data , 2012, 1203.4802.

[43]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[44]  Le-Shin Wu,et al.  Trinity RNA-Seq assembler performance optimization , 2012, XSEDE '12.

[45]  Mark Stitt,et al.  RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics , 2012, Nucleic Acids Res..

[46]  B. Haas,et al.  How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? , 2012, BMC Genomics.

[47]  F. Hendrickx,et al.  De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) , 2012, PloS one.

[48]  Craig A. Stewart,et al.  Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond , 2012 .

[49]  Hui Zhang,et al.  Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data , 2012, Statistical applications in genetics and molecular biology.

[50]  Angela M. Liu,et al.  Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma , 2012, Nature Genetics.

[51]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[52]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[53]  Christopher I Amos,et al.  Genomic sequencing in cancer. , 2013, Cancer letters.

[54]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..