Multi-dimensional Kernel Generation for Loop Nest Software Pipelining

Single-dimension Software Pipelining (SSP) has been proposed as an effective software pipelining technique for multi-dimensional loops [16]. This paper introduces for the first time the scheduling methods that actually produce the kernel code. Because of the multi-dimensional nature of the problem, the scheduling problem is more complex and challenging than with traditional modulo scheduling. The scheduler must handle multiple subkernels and initiation rates under specific scheduling constraints, while producing a solution that minimizes the execution time of the final schedule. In this paper three approaches are proposed: the level-by-level method, which schedules operations in loop level order, starting from the innermost, and does not let other operations interfere with the already scheduled levels, the flat method, which schedules operations from different loop levels with the same priority, and the hybrid method, which uses the level-by-level mechanism for the innermost level and the flat solution for the other levels. The methods subsume Huff's modulo scheduling [8] for single loops as a special case. We also break a scheduling constraint introduced in earlier publications and allow for a more compact kernel. The proposed approaches were implemented in the Open64/ORC compiler, and evaluated on loop nests from the Livermore, SPEC200 and NAS benchmarks.

[1]  Guang R. Gao,et al.  Extending Software Pipelining Techniques for Scheduling Nested Loops , 1993, LCPC.

[2]  Lori Pollock,et al.  A compiler framework for loop nest software-pipelining , 2006 .

[3]  Javier Zalamea,et al.  Register constrained modulo scheduling , 2004, IEEE Transactions on Parallel and Distributed Systems.

[4]  Guang R. Gao,et al.  A Framework for Resource-Constrained Rate-Optimal Software Pipelining , 1996, IEEE Trans. Parallel Distributed Syst..

[5]  Hongbo Rong,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004 .

[6]  Graham Wood,et al.  Global optimization of microprograms through modular control constructs , 1979, MICRO 12.

[7]  Philip H. Sweany,et al.  Improving software pipelining with unroll-and-jam , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[8]  Guang R. Gao,et al.  Code generation for single-dimension software pipelining of multi-dimensional loops , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[9]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[10]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[11]  Guang R. Gao,et al.  Optimal Modulo Scheduling Through Enumeration , 2004, International Journal of Parallel Programming.

[12]  Randolph E. Harr,et al.  Efficient pipelining of nested loops: unroll-and-squash , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[13]  Guang R. Gao,et al.  Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design , 2005, LCPC.

[14]  Guang R. Gao,et al.  Register allocation for software pipelined multi-dimensional loops , 2005, PLDI '05.

[15]  Frédéric Vivien,et al.  Constructing and exploiting linear schedules with prescribed parallelism , 2002, TODE.

[16]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[17]  Soo-Mook Moon,et al.  Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[18]  Guang R. Gao,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[19]  Guang R. Gao,et al.  Register allocation for software pipelined multidimensional loops , 2008, TOPL.

[20]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[21]  Jian Wang,et al.  Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops , 1996, CC.

[22]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[23]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[24]  J. Ramanujam Software Pipelining of Nested Loops , 1994 .

[25]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.