LOGAN: High-Performance GPU-Based X-Drop

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for thirdgeneration sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our highperformance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6× and 30.7× using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3× LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6×. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near optimal on the NVIDIA Tesla V100s.

[1]  Nan Ding,et al.  An Instruction Roofline Model for GPUs , 2019, 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[2]  Qiong Luo,et al.  Accelerating Long Read Alignment on Three Processors , 2019, ICPP.

[3]  William J. Dally,et al.  Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly , 2018, USENIX Annual Technical Conference.

[4]  Katherine Yelick,et al.  BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper , 2018, bioRxiv.

[5]  Masahiro Kasahara,et al.  Introducing difference recurrence relations for faster semi-global alignment of long sequences , 2018, BMC Bioinformatics.

[6]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[7]  Marco D. Santambrogio,et al.  Architectural optimizations for high performance and energy efficient Smith-Waterman implementation on FPGAs using OpenCL , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[8]  Amir Muhammadzadeh,et al.  MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data , 2014 .

[9]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[10]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[11]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[12]  Paul Horton,et al.  Parameters for accurate genome alignment , 2010, BMC Bioinformatics.

[13]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[14]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[15]  Knut Reinert,et al.  SeqAn An efficient, generic C++ library for sequence analysis , 2008, BMC Bioinformatics.

[16]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[17]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[18]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[19]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[20]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[21]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[22]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[23]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .