Accelerating Large Scale de novo Metagenome Assembly Using GPUs

Metagenomic workflows involve studying uncultured microorganisms directly from the environment. These environmental samples when processed by modern sequencing machines yield large and complex datasets that exceed the capabilities of metagenomic software. The increasing sizes and complexities of datasets make a strong case for exascale-capable metagenome assemblers. However, the underlying algorithmic motifs are not well suited for GPUs. This poses a challenge since the majority of next-generation supercomputers will rely primarily on GPUs for computation. In this paper we present the first of its kind GPU-accelerated implementation of the local assembly approach that is an integral part of a widely used large-scale metagenome assembler, MetaHipMer. Local assembly uses algorithms that induce random memory accesses and non-deterministic workloads, which make GPU offloading a challenging task. Our GPU implementation outperforms the CPU version by about $7\mathrm{x}$ and boosts the performance of MetaHipMer by 42% when running on 64 Summit nodes.

[1]  K. Yelick,et al.  ADEPT: a domain independent sequence alignment strategy for gpu architectures , 2020, BMC Bioinform..

[2]  Zaid Al-Ars,et al.  GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads , 2020, BMC Bioinformatics.

[3]  Katherine Yelick,et al.  Terabase-scale metagenome coassembly with MetaHipMer , 2020, Scientific Reports.

[4]  Richard Baraniuk,et al.  To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics , 2020, Nucleic acids research.

[5]  Leonid Oliker,et al.  The parallelism motifs of genomic data analysis , 2020, Philosophical Transactions of the Royal Society A.

[6]  Paola Bonizzoni,et al.  Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era , 2019, Quantitative Biology.

[7]  Nan Ding,et al.  An Instruction Roofline Model for GPUs , 2019, 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[8]  Leonid Oliker,et al.  Extreme Scale De Novo Metagenome Assembly , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Michal Kierzynka,et al.  GRASShopPER—An algorithm for de novo assembly based on GPU alignments , 2018, PloS one.

[10]  Seung-Jong Park,et al.  GPU-Accelerated Large-Scale Genome Assembly , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[12]  Hing-Fung Ting,et al.  MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. , 2016, Methods.

[13]  Leonid Oliker,et al.  HipMer: an extreme-scale de novo genome assembler , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  K. Lewis,et al.  A new antibiotic kills pathogens without detectable resistance , 2015, Nature.

[15]  Michael Bunce,et al.  Metagenomic analyses of bacteria on human hairs: a qualitative assessment for applications in forensic science , 2014, Investigative Genetics.

[16]  T. Sharpton An introduction to the analysis of shotgun metagenomic data , 2014, Front. Plant Sci..

[17]  Yongchao Liu,et al.  CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing , 2014, IEEE Design & Test.

[18]  Jizhong Zhou,et al.  Soil Microbial Community Responses to a Decade of Warming as Revealed by Comparative Metagenomics , 2013, Applied and Environmental Microbiology.

[19]  Kolin Paul,et al.  GAGM: Genome assembly on GPU using mate pairs , 2013, 20th Annual International Conference on High Performance Computing.

[20]  A. Osbourn,et al.  Comparative metatranscriptomics reveals kingdom level changes in the rhizosphere microbiome of plants , 2013, The ISME Journal.

[21]  Qiong Luo,et al.  GPU-Accelerated Bidirected De Bruijn Graph Construction for Genome Assembly , 2013, APWeb.

[22]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[23]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[24]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[25]  Huzefa Rangwala,et al.  GPU-Euler: Sequence Assembly Using GPGPU , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[26]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[27]  2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2021), St. Louis, MO, USA, November 15, 2021 , 2021, PMBS.

[28]  Nan Li,et al.  Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. , 2012, Briefings in functional genomics.