SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search

Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified programming interface implemented in C++, to abstract underlying architectural differences. Performance evaluation reveals that SWhybrid (i) outperforms all other tested state-of-the-art tools on both homogeneous and heterogeneous computing platforms, (ii) achieves an efficiency of over 80% on all tested CPUs and GPUs and over 70% on Xeon Phis, and (iii) achieves utlization rates of over 80% on all tested heterogeneous platforms. Our results demonstrate that there is enough commonality between vector-like instructions across CPUs and GPUs that one can develop higher-level abstractions and still specialize with close-to-peak performance. SWhybrid is open-source software and freely available at https://github.com/turbo0628/swhybrid.

[1]  Armando Eduardo De Giusti,et al.  OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases , 2018 .

[2]  Todd Mytkowicz,et al.  Efficient parallelization using rank convergence in dynamic programming algorithms , 2016, Commun. ACM.

[3]  Bowen Alpern,et al.  Microparallelism and High-Performance Protein Matching , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[4]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[6]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[7]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[8]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[9]  Weiguo Liu,et al.  XSW: Accelerating Biological Database Search on Xeon Phi , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[10]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[11]  Yongchao Liu,et al.  SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[12]  T. Rognes,et al.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. , 2001, Nucleic acids research.

[13]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[14]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[15]  Armando De Giusti,et al.  An energy‐aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures , 2015, Concurr. Comput. Pract. Exp..

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[18]  Mile Šikić,et al.  SW#–GPU-enabled exact alignments on genome scale , 2013, Bioinform..

[19]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[20]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[21]  Azzedine Boukerche,et al.  Parallel Optimal Pairwise Biological Sequence Comparison , 2016, ACM Comput. Surv..

[22]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[23]  Bertil Schmidt,et al.  Hyper customized processors for bio-sequence database scanning on FPGAs , 2005, FPGA '05.

[24]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[25]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[26]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[27]  Kai Xu,et al.  Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters , 2016, BMC Bioinformatics.

[28]  Jacek Blazewicz,et al.  Protein alignment algorithms with an efficient backtracking routine on multiple GPUs , 2011, BMC Bioinformatics.