Extending rnaSPAdes functionality for hybrid transcriptome assembly

De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

[1]  Alla Lapidus,et al.  ExSPAnder: a universal repeat resolver for DNA fragment assembly , 2014, Bioinform..

[2]  Elena Bushmanova,et al.  rnaQUAST: a quality assessment tool for de novo transcriptome assemblies , 2016, Bioinform..

[3]  Songnian Hu,et al.  PacBio full‐length cDNA sequencing integrated with RNA‐seq reads drastically improves the discovery of splicing transcripts in rice , 2018, The Plant journal : for cell and molecular biology.

[4]  Manja Marz,et al.  De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers , 2019, GigaScience.

[5]  Elena Bushmanova,et al.  rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data , 2019, GigaScience.

[6]  S. Kelly,et al.  TransRate: reference-free quality assessment of de novo transcriptome assemblies , 2015, bioRxiv.

[7]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[8]  Shilin Chen,et al.  IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing , 2018, Bioinform..

[9]  K. Neugebauer,et al.  Long-read sequencing of nascent RNA reveals coupling among RNA processing events , 2018, Genome research.

[10]  Andrey D. Prjibelski,et al.  Assembling short reads from jumping libraries with large insert sizes , 2015, Bioinform..

[11]  Feng Gao,et al.  Long‐Read RNA Sequencing Identifies Alternative Splice Variants in Hepatocellular Carcinoma and Tumor‐Specific Isoforms , 2019, Hepatology.

[12]  Elena Bushmanova,et al.  rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data , 2018, bioRxiv.

[13]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[14]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[15]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[16]  Daniel R. Garalde,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[17]  A. Furtado,et al.  Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts , 2017, GigaScience.

[18]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[19]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[20]  Justin Chu,et al.  RNA-Bloom provides lightweight reference-free transcriptome assembly for single cells , 2019, bioRxiv.

[21]  Hugh E. Olsen,et al.  Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells , 2017, Nature Communications.