Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

BackgroundWith the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.ResultsTo reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.ConclusionsOur work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

[1]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[2]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[3]  Chuan-Xi Zhang,et al.  De novo characterization of a whitefly transcriptome and analysis of its gene expression during development , 2010, BMC Genomics.

[4]  S. Horvath,et al.  Transcriptomic Analysis of Autistic Brain Reveals Convergent Molecular Pathology , 2011, Nature.

[5]  Satoshi Yamaguchi,et al.  Estimation of the Genome Size of Tea (Camellia sinensis), Camellia (C.japonica), and their Interspecific Hybrids by Flow Cytometry , 2006 .

[6]  Akhilesh K. Tyagi,et al.  De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[7]  Robert Turgeon,et al.  The developmental dynamics of the maize leaf transcriptome , 2010, Nature Genetics.

[8]  Wei Li,et al.  Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing , 2011, Proceedings of the National Academy of Sciences.

[9]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[10]  J. Montoya-Burgos,et al.  Optimization of de novo transcriptome assembly from next-generation sequencing data. , 2010, Genome research.

[11]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[12]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[13]  Manolis Kellis,et al.  Comparative Functional Genomics of the Fission Yeasts , 2011, Science.

[14]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[15]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[16]  Chengying Shi,et al.  Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds , 2011, BMC Genomics.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  S. Barrett,et al.  De novo sequence assembly and characterization of the floral transcriptome in cross- and self-fertilizing plants , 2011, BMC Genomics.

[19]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[20]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[21]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[22]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.