On Optimizing the Longest Common Subsequence Problem by Loop Unrolling Along Wavefronts

Loop unrolling is a loop transformation where a few loop iterations are grouped as a super iteration for exploring more independent instructions and to decrease the total loop overhead. This paper characterizes loop unrolling by the unrolling factor, the number of iterations in a super iteration and the unrolling direction, the choice of iterations to be grouped to form the super iteration. We use loop unrolling for maximizing instruction-level parallelism in the longest common subsequence problem. To increase the number of independent instructions in the super iteration, we use a linear schedule to group iterations on the same wave front, a hyper plane in the loop iteration space. Then, the loop is unrolled along the wave front which guarantees all iterations in the same super iteration are independent. The selection of the optimal unrolling factor is based on the assumption that if all the pipelines are saturated, the performance should not be bad. Two necessary conditions and a sufficient condition for optimality are presented and used to find the optimal unrolling factor. The total execution time is expressed as a function of algorithm parameters, architecture parameters and the unrolling factor. A benchmark of the technique scores a 1.475 speed-up over traditional methods.

[1]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[2]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  Litong Song,et al.  A technique for variable dependence driven loop peeling , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[4]  Zaher Mahjoub,et al.  Parallelization of the dynamic programming algorithm for solving the longest common subsequence problem , 2010, ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010.

[5]  Vivek Sarkar Optimized Unrolling of Nested Loops , 2004, International Journal of Parallel Programming.

[6]  Jaewook Shin,et al.  Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[7]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.

[8]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Edwin Hsing-Mean Sha,et al.  Polynomial-time nested loop fusion with full parallelism , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[10]  Tarek S. Abdelrahman,et al.  Scheduling of wavefront parallelism on scalable shared-memory multiprocessors , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[11]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[12]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[13]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[14]  J. Ramanujam,et al.  Parameterized tiling revisited , 2010, CGO '10.

[15]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.