Optimal Modulo Scheduling Through Enumeration

Resource-constrained software-pipelining has played an increasingly significant role in exploiting instruction-level parallelism and has drawn intensive academic and industrial interest. The challenge is to find a schedule which is optimal : i.e., given the data dependence graph (DDG) for a loop, find the fastest possible schedule under given resource constraints while keeping register usage minimal. This paper proposes a novel enumeration based modulo scheduling approach to solve this problem. The proposed approach does not require any awkward reworking of constraints into linear form and employs a realistic register model. The set of schedules enumerated also allows us to characterize the schedule space and address questions such as whether schedules using a small number of registers tend to require a large number of function units. The proposed approach has been implemented under the MOST testbed at McGill University. Experimental results on more than 1000 loops from popular benchmark programs show that enumeration is generally faster at obtaining optimal schedules than integer linear programming approaches. Compared to Huff's Slack Scheduling , enumeration found a faster schedule for almost 15% of loops, with a mean improvement of 18%. 10% of the remaining loops required fewer registers under enumeration, with a mean reduction of 16%.

[1]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[2]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[3]  Guang R. Gao,et al.  Minimizing register requirements under resource-constrained rate-optimal software pipelining , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[5]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[6]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[7]  Guang R. Gao,et al.  Scheduling and mapping: software pipelining in the presence of structural hazards , 1995, PLDI '95.

[8]  Alexander Aiken,et al.  Resource-Constrained Software Pipelining , 1995, IEEE Trans. Parallel Distributed Syst..

[9]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[10]  S. Vegdahl,et al.  A Dynamic-programming Technique For Compacting Loops , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[11]  Scott A. Mahlke,et al.  Reverse If-Conversion , 1993, PLDI '93.

[12]  Erik R. Altman,et al.  Optimal software pipelining with function unit and register constraints , 1996 .

[13]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[14]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[15]  Alexandre E. Eichenberger,et al.  Optimum modulo schedules for minimum register requirements , 1995 .

[16]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[17]  Guang R. Gao,et al.  A Polynomial Time Method for Optimal Software Pipelining , 1992, CONPAR.

[18]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[19]  F. Gasperoni,et al.  Efficient Algorithms for Cyclic Scheduling , 1991 .

[20]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[21]  Grant E. Haab,et al.  Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[22]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[23]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[24]  S. Peter Song,et al.  The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[25]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[26]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[27]  James C. Dehnert,et al.  Compiling for the Cydra , 1993, The Journal of Supercomputing.

[28]  James E. Smith,et al.  PowerPC 601 and Alpha 21064: a tale of two RISCs , 1994, Computer.

[29]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[30]  Alexandre E. Eichenberger,et al.  Minimum register requirements for a modulo schedule , 1994, MICRO 27.

[31]  Paul Feautrier Fine-Grain Scheduling under Resource Constraints , 1994, LCPC.

[32]  Guang R. Gao,et al.  A novel framework of register allocation for software pipelining , 1993, POPL '93.

[33]  Guang R. Gao,et al.  Optimal Software Pipelining Through Enumeration of Schedules , 1996, Euro-Par, Vol. II.