Parallel Tiled Codes Implementing the Smith-Waterman Alignment Algorithm for Two and Three Sequences

The Smith-Waterman (SW) algorithm explores all the possible alignments between two or more sequences and as a result it returns the optimal local alignment. However, the computational cost of this algorithm is very high, and the exponential growth of computation makes SW unrealistic for searching similarities in large sets of sequences. Fortunately, the dynamic programming kernel of the SW algorithm involves mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply polyhedral compilation techniques to optimize the studied SW dense array code. In this article, we present an approach to generate efficient SW implementations for two and three sequences by using the transitive closure of a dependence graph and loop skewing. Generated programs are represented with parallel tiled loop nests, which expose significantly higher performance than that of programs obtained with closely related compilers. The approach is able to tile all loops of original loop nests as opposed to well-known affine transformation techniques. Furthermore, it allows for code optimization of three-sequence alignment. Such a code cannot be generated by means of state-of-the-art automatic optimizing compilers. We demonstrate that an under-approximation of transitive closure (instead of exact transitive closure) can be used to generate valid parallel tiled code. This considerably reduces the computational complexity of the approach. Generated codes were run on cores of a modern Intel multiprocessor and they expose high speedup and good scalability on this platform.

[1]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  Li Liu,et al.  Efficient Nonserial Polyadic Dynamic Programming on the Cell Processor , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[3]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[4]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[5]  Armando De Giusti,et al.  Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA , 2017, IWBBIO.

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[8]  Marek Palkowski,et al.  Tiling arbitrarily nested loops by means of the transitive , 2016, Int. J. Appl. Math. Comput. Sci..

[9]  Vivek Sarkar,et al.  Polyhedral Optimizations for a Data-Flow Graph Language , 2015, LCPC.

[10]  David Wonnacott,et al.  Automatic Tiling of “ Mostly-Tileable ” Loop Nests , 2014 .

[11]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[12]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[13]  Jaime Velasco-Medina,et al.  Hardware implementation of the Smith-Waterman algorithm using a systolic architecture , 2014, 2014 IEEE 5th Latin American Symposium on Circuits and Systems.

[14]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[15]  Michael Wolfe,et al.  Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.

[16]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  William Pugh,et al.  Iteration Space Slicing for Locality , 1999, LCPC.

[19]  Martin Griebl,et al.  Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .

[20]  Sartaj Sahni,et al.  Cache and energy efficient alignment of very long sequences , 2015, 2015 IEEE 5th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[21]  Uday Bondhugula,et al.  Tiling for Dynamic Scheduling , 2014 .

[22]  Christophe Alias,et al.  Mono-parametric Tiling is a Polyhedral Transformation , 2015 .

[23]  Marek Palkowski,et al.  Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing , 2017, BMC Bioinformatics.