SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions

The well-known Smith–Waterman (SW) algorithm is the most commonly used method for local sequence alignments, but its acceptance is limited by the computational requirements for large protein databases. Although the acceleration of SW has already been studied on many parallel platforms, there are hardly any studies which take advantage of the latest Intel architectures based on AVX-512 vector extensions. This SIMD set is currently supported by Intel’s Knights Landing (KNL) accelerator and Intel’s Skylake (SKL) general purpose processors. In this paper, we present an SW version that is optimized for both architectures: the renowned SWIMM 2.0. The novelty of this vector instruction set requires the revision of previous programming and optimization techniques. SWIMM 2.0 is based on a massive multi-threading and SIMD exploitation. It is competitive in terms of performance compared with other state-of-the-art implementations, reaching 511 GCUPS on a single KNL node and 734 GCUPS on a server equipped with a dual SKL processor. Moreover, these successful performance rates make SWIMM 2.0 the most efficient energy footprint implementation in this study achieving 2.94 GCUPS/Watts on the SKL processor.

[1]  Jeff Daily,et al.  Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments , 2016, BMC Bioinformatics.

[2]  C. H. Camargo,et al.  Species distribution and susceptibility profile of Candida species in a Brazilian public tertiary hospital , 2010, BMC Research Notes.

[3]  Ahmet T. Erdogan,et al.  An FPGA-based parameterised and scalable optimal solutions for pairwise biological sequence analysis , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[4]  Eric Bender,et al.  Big data in biomedicine: 4 big questions , 2015, Nature.

[5]  Armando De Giusti,et al.  An energy‐aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures , 2015, Concurr. Comput. Pract. Exp..

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[8]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[9]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  Yongchao Liu,et al.  SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Torbjørn Rognes,et al.  PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology , 2005, Nucleic Acids Res..

[13]  Armando Eduardo De Giusti,et al.  State-of-the-Art in Smith–Waterman Protein Database Search on HPC Platforms , 2016 .

[14]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[15]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[16]  Armando Eduardo De Giusti,et al.  OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases , 2018 .

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[19]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[20]  Weiguo Liu,et al.  Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[21]  Armando De Giusti,et al.  First Experiences Accelerating Smith-Waterman on Intel's Knights Landing Processor , 2017, ICA3PP.

[22]  Mariano J. Alvarez,et al.  Model based analysis of real-time PCR data from DNA binding dye protocols , 2007, BMC Bioinformatics.

[23]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[24]  Yongchao Liu,et al.  SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[25]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[26]  Jakob Tobias Frielingsdorf Improving optimal sequence alignments through a SIMD-accelerated library , 2015 .

[27]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.