Demystifying on-the-fly spill code

Modulo scheduling is an effective code generation technique that exploits the parallelism in program loops by overlapping iterations. One drawback of this optimization is that register requirements increase significantly because values across different loop iterations can be live concurrently. One possible solution to reduce register pressure is to insert spill code to release registers. Spill code stores values to memory between the producer and consumer instructions.Spilling heuristics can be divided into two classes: 1) a posteriori approaches (spill code is inserted after scheduling the loop) or 2) on-the-fly approaches (spill code is inserted during loop scheduling). Recent studies have reported obtaining better results for spilling on-the-fly. In this work, we study both approaches and propose two new techniques, one for each approach. Our new algorithms try to address the drawbacks observed in previous proposals. We show that the new algorithms outperform previous techniques and, at the same time, reduce compilation time. We also show that, much to our surprise, a posteriori spilling can be in fact slitghtly more effective than on-the-fly spilling.

[1]  Gregory J. Chaitin,et al.  Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.

[2]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[3]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[4]  Jian Wang,et al.  Software pipelining with register allocation and spilling , 1994, MICRO 27.

[5]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[6]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[7]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[8]  Josep Llosa,et al.  Heuristics for register-constrained software pipelining , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[9]  Javier Zalamea,et al.  Improved spill code generation for software pipelined loops , 2000, PLDI '00.

[10]  P. Faraboschi,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Z. Greenfield,et al.  The TigerSHARC DSP Architecture , 2000, IEEE Micro.

[13]  Stamatis Vassiliadis,et al.  The ManArray/sup TM/ embedded processor architecture , 2000, Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future.

[14]  A. Gonzalez,et al.  Graph-partitioning based instruction scheduling for clustered processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[15]  Javier Zalamea,et al.  MIRS: Modulo Scheduling with Integrated Register Spilling , 2001, LCPC.

[16]  Antonio González,et al.  A unified modulo scheduling and register allocation technique for clustered processors , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[17]  Antonio González,et al.  Graph-partitioning based instruction scheduling for clustered processors , 2001, MICRO.

[18]  David R. Kaeli,et al.  Exploiting pseudo-schedules to guide data dependence graph partitioning , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[19]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.