Accessible and interactive RNA sequencing analysis using serverless computing

Abstract We have used serverless AWS Lambda functions to align 640 million reads in less than 3 minutes, a speed-up of 500x over the single-threaded implementation. Using a hybrid cloud architecture and software modified to optimize disk transfers, an entire RNA sequencing workflow transforming multiplexed reads to transcript counts that originally took 29 hours can be completed in 18 minutes. This is a 100x improvement over the original single threaded implementation and 12x faster than an optimized cloud server-based implementation using 16 threads. The total cost of the analyses is $2.82 for 96 wells or 3 cents per multiplexed sample. This approach can be used for human datasets that are generated for single experiments and does not rely on processing large numbers of samples to achieve the performance gains. The workflow is publicly available under a M.I.T. license (https://github.com/BioDepot/RNA-seq-lambda).

[1]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[2]  B. Langmead,et al.  Cloud computing for genomic data analysis and collaboration , 2018, Nature Reviews Genetics.

[3]  Ernst Houtgast,et al.  Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths , 2018, Comput. Biol. Chem..

[4]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[5]  David R. FitzPatrick,et al.  Paediatric genomics: diagnosing rare disease in children , 2018, Nature Reviews Genetics.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[8]  Jorge Amigo,et al.  SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data , 2016, PloS one.

[9]  Mario Cannataro,et al.  Cloud Computing in Bioinformatics: current solutions and challenges , 2016 .

[10]  Ravi Iyengar,et al.  A Comparison of mRNA Sequencing with Random Primed and 3′-Directed Libraries , 2017, Scientific Reports.

[11]  Wolfgang Huber,et al.  RNA-Seq workflow: gene-level exploratory analysis and differential expression , 2015, F1000Research.

[12]  Ka Yee Yeung,et al.  Building Containerized Workflows Using the BioDepot-Workflow-Builder. , 2019, Cell systems.

[13]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[14]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[15]  Ka Yee Yeung,et al.  Holistic optimization of an RNA-seq workflow for multi-threaded environments , 2019, Bioinform..

[16]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.