Improving the performance of the needleman-wunsch algorithm using parallelization and vectorization techniques

The Needleman-Wunsch (NW) is a dynamic programming algorithm used in the pairwise global alignment of two biological sequences. In this paper, three sets of parallel implementations of the NW algorithm are presented using a mixture of specialized software and hardware solutions: POSIX Threads-based, SIMD Extensions-based and a GPU-based implementations. The three implementations aim at improving the performance of the NW algorithm on large scale input without affecting its accuracy. Our experiments show that the GPU-based implementation is the best implementation as it achieves performance 72.5X faster than the sequential implementation, whereas the best performance achieved by the POSIX threads and the SIMD techniques are 2X and 18.2X faster than the sequential implementation, respectively.

[1]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[2]  Mahmoud Al-Ayyoub,et al.  Accelerating Needleman-Wunsch global alignment algorithm with GPUs , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).

[3]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[4]  Fayez Gebali,et al.  Algorithms and Parallel Computing , 2011 .

[5]  Mahmoud Al-Ayyoub,et al.  Accelerating compute intensive medical imaging segmentation algorithms using hybrid CPU-GPU implementations , 2017, Multimedia Tools and Applications.

[6]  Erik Vermij,et al.  Genetic sequence alignment on a supercomputing platform , 2011 .

[7]  T. R. P. Siriwardena,et al.  Accelerating global sequence alignment using CUDA compatible multi-core GPU , 2010, 2010 Fifth International Conference on Information and Automation for Sustainability.

[8]  Mahmoud Al-Ayyoub,et al.  Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[9]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[10]  Wei Zhou,et al.  Protein database search of hybrid alignment algorithm based on GPU parallel acceleration , 2017, The Journal of Supercomputing.

[11]  D. K. Y. Chiu,et al.  A survey of multiple sequence comparison methods , 1992 .

[12]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[13]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..

[14]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[15]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[16]  Kenli Li,et al.  Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  David R. Butenhof Programming with POSIX threads , 1993 .

[18]  Xavier Martorell,et al.  CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters , 2016, IEEE Transactions on Parallel and Distributed Systems.

[19]  Fayez Gebali,et al.  Algorithms and Parallel Computing: Gebali/Algorithms and Parallel Computing , 2011 .

[20]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[21]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis , 2017, IWBBIO.

[22]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[23]  Mahmoud Al-Ayyoub,et al.  Improving FCM and T2FCM algorithms performance using GPUs for medical images segmentation , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[24]  Sara El-Metwally,et al.  Next Generation Sequencing Technologies and Challenges in Sequence Assembly , 2014, SpringerBriefs in Systems Biology.

[25]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[26]  Mahmoud Al-Ayyoub,et al.  Using GPUs to speed-up Levenshtein edit distance computation , 2016, 2016 7th International Conference on Information and Communication Systems (ICICS).

[27]  Yongchao Liu,et al.  GSWABE: faster GPU‐accelerated sequence alignment with optimal alignment retrieval for short DNA sequences , 2015, Concurr. Comput. Pract. Exp..

[28]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..