Extending rnaSPAdes functionality for hybrid transcriptome assembly

Background De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. Results In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. Conclusion To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used. Availability and implementation rnaSPAdes is implemented in C++ and Python and is freely available for Linux and MacOS under GPLv2 license at cab.spbu.ru/software/rnaspades/ and github.com/ablab/spades.

[1]  Feng Gao,et al.  Long‐Read RNA Sequencing Identifies Alternative Splice Variants in Hepatocellular Carcinoma and Tumor‐Specific Isoforms , 2019, Hepatology.

[2]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[3]  Elena Bushmanova,et al.  rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data , 2018, bioRxiv.

[4]  Daniel R. Garalde,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[5]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[6]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[7]  Andrey D. Prjibelski,et al.  Assembling short reads from jumping libraries with large insert sizes , 2015, Bioinform..

[8]  Elena Bushmanova,et al.  rnaQUAST: a quality assessment tool for de novo transcriptome assemblies , 2016, Bioinform..

[9]  S. Kelly,et al.  TransRate: reference-free quality assessment of de novo transcriptome assemblies , 2015, bioRxiv.

[10]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[11]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[12]  Justin Chu,et al.  RNA-Bloom provides lightweight reference-free transcriptome assembly for single cells , 2019, bioRxiv.

[13]  Hugh E. Olsen,et al.  Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells , 2017, Nature Communications.

[14]  Alla Lapidus,et al.  ExSPAnder: a universal repeat resolver for DNA fragment assembly , 2014, Bioinform..

[15]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[16]  A. Furtado,et al.  Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts , 2017, GigaScience.

[17]  Shilin Chen,et al.  IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing , 2018, Bioinform..

[18]  K. Neugebauer,et al.  Long-read sequencing of nascent RNA reveals coupling among RNA processing events , 2018, Genome research.

[19]  Songnian Hu,et al.  PacBio full‐length cDNA sequencing integrated with RNA‐seq reads drastically improves the discovery of splicing transcripts in rice , 2018, The Plant journal : for cell and molecular biology.

[20]  Manja Marz,et al.  De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers , 2019, GigaScience.