De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers

Abstract Background In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly. Results Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets. Conclusions We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.

[1]  Mihai Pop,et al.  Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows , 2018, Bioinform..

[2]  Elena Bushmanova,et al.  rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data , 2018, bioRxiv.

[3]  E. Moriyama,et al.  Next-Generation Transcriptome Assembly: Strategies and Performance Analaysis , 2018, Bioinformatics in the Era of Post Genomics and Big Data.

[4]  Matthew J. Geniza,et al.  Tools for building de novo transcriptome assembly , 2017 .

[5]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[6]  Hugo Y. K. Lam,et al.  Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis , 2017, Nature Communications.

[7]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[8]  G. Sacomoto,et al.  Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads , 2017, Algorithms for Molecular Biology.

[9]  Sita J. Saunders,et al.  Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells , 2016, Scientific Reports.

[10]  Elena Bushmanova,et al.  rnaQUAST: a quality assessment tool for de novo transcriptome assemblies , 2016, Bioinform..

[11]  Satshil Rana,et al.  Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus , 2016, PloS one.

[12]  Kayvon Mazooji,et al.  Shannon: An Information-Optimal de Novo RNA-Seq Assembler , 2016, bioRxiv.

[13]  Ting Yu,et al.  BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data , 2016, PLoS Comput. Biol..

[14]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[15]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[16]  S. Kelly,et al.  TransRate: reference-free quality assessment of de novo transcriptome assemblies , 2015, bioRxiv.

[17]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[18]  Xiuzhen Huang,et al.  Bridger: a new framework for de novo transcriptome assembly using RNA-seq data , 2015, Genome Biology.

[19]  N. Pavelka,et al.  The Transcriptional Stress Response of Candida albicans to Weak Organic Acids , 2015, G3: Genes, Genomes, Genetics.

[20]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[21]  J. Mudge,et al.  Comparisons of De Novo Transcriptome Assemblers in Diploid and Polyploid Species Using Peanut (Arachis spp.) RNA-Seq Data , 2014, PloS one.

[22]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[23]  Kay Nieselt,et al.  Global Transcriptional Start Site Mapping Using Differential RNA Sequencing Reveals Novel Antisense RNAs in Escherichia coli , 2014, Journal of bacteriology.

[24]  Yongsheng Bai,et al.  Evaluation of de novo transcriptome assemblies from RNA-Seq data , 2014, Genome Biology.

[25]  Sang Yeol Lee,et al.  MED18 interaction with distinct transcription factors regulates multiple plant functions , 2014, Nature Communications.

[26]  Siu-Ming Yiu,et al.  IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels , 2013, Bioinform..

[27]  Xun Xu,et al.  SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads , 2013, Bioinform..

[28]  Tieliu Shi,et al.  Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq , 2013, Science China Life Sciences.

[29]  R. Marsh,et al.  Comparative analysis of de novo transcriptome assembly , 2013, Science China Life Sciences.

[30]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[31]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[32]  David C Corney,et al.  RNA-seq Using Next Generation Sequencing , 2012 .

[33]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[34]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[35]  Xuan Li,et al.  Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study , 2011, BMC Bioinformatics.

[36]  Tieliu Shi,et al.  De novo transcriptome assembly of RNA-Seq reads with different strategies , 2011, Science China Life Sciences.

[37]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[38]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[39]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[40]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[41]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[42]  M. Blaxter,et al.  Comparing de novo assemblers for 454 transcriptome data , 2010, BMC Genomics.

[43]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[44]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[45]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[46]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[47]  T. Wetter,et al.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. , 2004, Genome research.

[48]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[49]  B. Tian,et al.  RNA‐Seq methods for transcriptome analysis , 2017, Wiley interdisciplinary reviews. RNA.

[50]  Michael Gribskov,et al.  Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. , 2016, Bioinformatics.

[51]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[52]  A. Sanchez,et al.  Molecular biology and evolution of filoviruses. , 1993, Archives of virology. Supplementum.