GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis

Nanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c.

[1]  A. Bartoszek,et al.  DNA methylation in cancer development, diagnosis and therapy--multiple opportunities for genotoxic agents to act as methylome disruptors or remediators. , 2011, Mutagenesis.

[2]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[3]  S. Gonzalo,et al.  Epigenetic alterations in aging. , 2010, Journal of applied physiology.

[4]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[6]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[7]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[8]  Sri Parameswaran,et al.  Featherweight long read alignment using partitioned reference indexes , 2018, Scientific Reports.

[9]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[10]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[11]  Masahiro Kasahara,et al.  Introducing difference recurrence relations for faster semi-global alignment of long sequences , 2018, BMC Bioinformatics.

[12]  Immo Huismann,et al.  Load Balancing for CPU-GPU Coupling in Computational Fluid Dynamics , 2017, PPAM.

[13]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[14]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[15]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[16]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[17]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[18]  Jens Lang,et al.  Dynamic Distribution of Workload Between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM , 2013, ICCS.

[19]  M. D. Del Bigio,et al.  DNA Modifications: Function and Applications in Normal and Disease States , 2014, Biology.

[20]  Alain Bergeron,et al.  Genomic hallmarks of localized, non-indolent prostate cancer , 2017, Nature.

[21]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[22]  Jack J. Purdum,et al.  C programming guide , 1983 .

[23]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.