SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

BackgroundThe Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi accelerators and Field Programmable Gate Arrays (FGPAs) to speed up large-scale workloads.ResultsThis paper presents and evaluates SWIFOLD: a Smith-Waterman parallel Implementation on FPGA with OpenCL for Long DNA sequences. First, we evaluate its performance and resource usage for different kernel configurations. Next, we carry out a performance comparison between our tool and other state-of-the-art implementations considering three different datasets. SWIFOLD offers the best average performance for small and medium test sets, achieving a performance that is independent of input size and sequence similarity. In addition, SWIFOLD provides competitive performance rates in comparison with GPU-based implementations on the latest GPU generation for the large dataset.ConclusionsThe results suggest that SWIFOLD can be a serious contender for accelerating the SW alignment of DNA sequences of unrestricted size in an affordable way reaching on average 125 GCUPS and almost a peak of 270 GCUPS.

[1]  Jeff Daily,et al.  Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments , 2016, BMC Bioinformatics.

[2]  Yongchao Liu,et al.  SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[3]  Armando De Giusti,et al.  Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA , 2017, IWBBIO.

[4]  Eduard Ayguadé,et al.  MASA: A Multiplatform Architecture for Sequence Aligners with Block Pruning , 2016, ACM Trans. Parallel Comput..

[5]  Yi Li,et al.  A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues , 2013, BMC Bioinformatics.

[6]  Wayne Luk,et al.  FPGA-Based Smith-Waterman Algorithm: Analysis and Novel Design , 2011, ARC.

[7]  Sahil R. Kalra,et al.  Big Challenges? Big Data … , 2015 .

[8]  Armando Eduardo De Giusti,et al.  State-of-the-Art in Smith–Waterman Protein Database Search on HPC Platforms , 2016 .

[9]  Armando Eduardo De Giusti,et al.  OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases , 2018 .

[10]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[11]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[12]  Yongchao Liu,et al.  SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[13]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[14]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[15]  Shannon Irene Steinfadt Smith-Waterman Sequence Alignment For Massively Parallel High-Performance Computing Architectures , 2010 .

[16]  Armando De Giusti,et al.  An energy‐aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures , 2015, Concurr. Comput. Pract. Exp..

[17]  Stephen C. Ekker,et al.  Mojo Hand, a TALEN design tool for genome editing applications , 2013, BMC Bioinformatics.

[18]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Octavio Nieto-Taladriz,et al.  Fpga Acceleration for DNA Sequence Alignment , 2007, J. Circuits Syst. Comput..

[20]  Sean O. Settle High-performance Dynamic Programming on FPGAs with OpenCL , 2013 .

[21]  Eduard Ayguadé,et al.  CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[22]  Stephen Neuendorffer,et al.  FPGA Based OpenCL Acceleration of Genome Sequencing Software , 2015 .

[23]  Mile Šikić,et al.  SW#–GPU-enabled exact alignments on genome scale , 2013, Bioinform..

[24]  Lars Wienbrandt,et al.  Bioinformatics Applications on the FPGA-Based High-Performance Computer RIVYERA , 2013 .

[25]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[26]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[27]  Torbjørn Rognes,et al.  PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology , 2005, Nucleic Acids Res..

[28]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[29]  Xavier Martorell,et al.  CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters , 2016, IEEE Transactions on Parallel and Distributed Systems.