Just-In-Time Software Pipelining

Software pipelining exploits instruction-level parallelism from loops. In static compilers, it has been one of the most efficient optimizations for wide-issue architectures. However, the compilation time is at least O(|V|3) (V: the set of operations in a loop) and in the worst-case exponential. This paper extends software pipelining to dynamic compilers. We present a novel and simple algorithm with linear time O(|V| + |E|) (E: the set of edges in the dependence graph of a loop). Preliminary experiments show the method is light-weight and generates optimal or near-optimal schedules.

[1]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[2]  Timothy W. O'Neil,et al.  New Heuristics for Rotation Scheduling , 2008, PDPTA.

[3]  Craig B. Zilles,et al.  A real system evaluation of hardware atomicity for software speculation , 2010, ASPLOS XV.

[4]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[5]  Michael Gschwind,et al.  Optimizations and oracle parallelism with dynamic translation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[7]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[8]  Scott A. Mahlke,et al.  VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.

[9]  Soo-Mook Moon,et al.  Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[10]  Jian Wang,et al.  Decomposed software pipelining: A new perspective and a new approach , 1994, International Journal of Parallel Programming.

[11]  Suneel Jain,et al.  Circular scheduling: a new technique to perform software pipelining , 1991, PLDI '91.

[12]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[13]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[14]  Yves Robert,et al.  Circuit Retiming Applied to Decomposed Software Pipelining , 1998, IEEE Trans. Parallel Distributed Syst..

[15]  J. Quadrat,et al.  Numerical Computation of Spectral Elements in Max-Plus Algebra☆ , 1998 .

[16]  Arjan J. C. van Gemund,et al.  On the complexity of list scheduling algorithms for distributed-memory systems , 1999, ICS '99.

[17]  Cheng Wang,et al.  LAR-CC: Large atomic regions with conditional commits , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[18]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[19]  Uwe Schwiegelshohn,et al.  Generating Close to Optimum Loop Schedules on Parallel Processors , 1994, Parallel Process. Lett..

[20]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[21]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[22]  S. Irani,et al.  Efficient algorithms for optimum cycle mean and optimum cost to time ratio problems , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[23]  Cheng Wang,et al.  Allocating rotating registers by scheduling , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  David Eppstein,et al.  Randomized Speedup of the Bellman-Ford Algorithm , 2011, ANALCO.