Parallelizing dynamic programming through rank convergence

This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation. This paper demonstrates the efficiency of the parallel algorithm by showing significant speed ups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up-to 24x faster (with 64 processors) than a highly optimized commercial baseline.

[1]  John K. Antonio,et al.  A Highly Parallel Algorithm for Multistage Optimization Problems and Shortest Path Problems , 1991, J. Parallel Distributed Comput..

[2]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3]  Zvi Galil,et al.  Parallel Dynamic Programming , 1991 .

[4]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[5]  Guillermo Delgado,et al.  Data dependency reduction in Dynamic Programming matrix , 2011, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE).

[6]  Leslie G. Valiant,et al.  Fast Parallel Computation of Polynomials Using Few Processors , 1983, SIAM J. Comput..

[7]  W. W. Peterson,et al.  Error-Correcting Codes. , 1962 .

[8]  Maxime Crochemore,et al.  A fast and practical bit-vector algorithm for the Longest Common Subsequence problem , 2001, Inf. Process. Lett..

[9]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[10]  B. Sturmfels,et al.  On the rank of a tropical matrix , 2003 .

[11]  F. Lemmermeyer Error-correcting Codes , 2005 .

[12]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem , 1991, J. Parallel Distributed Comput..

[13]  H. Meyr,et al.  High-speed parallel Viterbi decoding: algorithm and VLSI-architecture , 1991, IEEE Communications Magazine.

[14]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[15]  Peter J. Stuckey,et al.  Lock-free parallel dynamic programming , 2010, J. Parallel Distributed Comput..

[16]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.

[17]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[18]  Yoichi Muraoka,et al.  Parallelism exposure and exploitation in programs , 1971 .

[19]  Guang R. Gao,et al.  Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures , 2009, IEEE Transactions on Parallel and Distributed Systems.

[20]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[21]  Sebastian Deorowicz,et al.  Bit-Parallel Algorithm for the Constrained Longest Common Subsequence Problem , 2010, Fundam. Informaticae.

[22]  Heikki Hyyro Bit-Parallel LCS-length Computation Revisited , 2004 .

[23]  D Tang,et al.  An efficient parallel dynamic programming algorithm , 1995 .

[24]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[25]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[26]  Guang R. Gao,et al.  A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison , 2000, Pacific Symposium on Biocomputing.

[27]  Lei Liu,et al.  Safe parallel programming using dynamic dependence hints , 2011, OOPSLA '11.

[28]  G. Ramalingam,et al.  Safe programmable speculative parallelism , 2010, PLDI '10.

[29]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[30]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[31]  Zvi Galil,et al.  Parallel Algorithms for Dynamic Programming Recurrences with More than O(1) Dependency , 1994, J. Parallel Distributed Comput..

[32]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[33]  Tao Zhang,et al.  EasyPDP: An Efficient Parallel Dynamic Programming Runtime System for Computational Biology , 2012, IEEE Transactions on Parallel and Distributed Systems.

[34]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[35]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[36]  Gerhard Fettweis,et al.  Parallel Viterbi algorithm implementation: breaking the ACS-bottleneck , 1989, IEEE Trans. Commun..

[37]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.