ILP-based modulo scheduling for high-level synthesis

In high-level synthesis, loop pipelining is a technique to improve the throughput and utilisation of hardware datapaths by starting new loop iterations after a fixed amount of time, called the initiation interval (II), allowing to overlap subsequent iterations. The problem is to find the smallest II and corresponding operation schedule that fulfils all data dependencies and resource constraints, both of which are usually found by modulo scheduling. We propose Moovac1, a novel integer linear program (ILP) formulation of the modulo scheduling problem based on overlap variables to model exact resource constraints. Given enough time, Moovac will find a mimimum-II solution. This is in contrast to Canis' state-of-the-art Modulo SDC approach, which requires heuristic simplifications of the resource constraints. Moovac can thus be used as a reference to evaluate heuristics, or in a time-limited mode as a heuristic itself to provide a best-so-far solution. We schedule kernels from the CHStone and MachSuite benchmarks for loop pipelining with Moovac, Modulo SDC and a prior exact formulation by Eichenberger. Moovac has competitive performance in its time-limited mode, and delivers better results faster than the Modulo SDC scheduler for some loops. Often its structure leads to quicker solution times than Eichenberger's formulation. Using the Moovac-computed optimal solutions as a reference, we can confirm that the Modulo SDC heuristic is indeed capable of finding optimal or near-optimal solutions for the majority of smallto medium-sized loops. However, for larger loops the two algorithms begin to diverge, with Moovac often being significantly faster to prove the infeasibility of a candidate II. This can be exploited by running both schedulers synergistically, leading to a quicker convergence to the final II.

[1]  Jason Helge Anderson,et al.  Modulo SDC scheduling with recurrence minimization in high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[2]  Andreas Koch,et al.  Hardware/software co-compilation with the Nymble system , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[3]  Zhiru Zhang,et al.  SDC-based modulo scheduling for pipeline synthesis , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Shail Aditya,et al.  Cycle-time aware architecture synthesis of custom hardware accelerators , 2002, CASES '02.

[5]  Yu-Chin Hsu,et al.  A formal approach to the scheduling problem in high level synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Josep Llosa,et al.  A comparative study of modulo scheduling techniques , 2002, ICS '02.

[7]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[8]  Dupont de Dinechin Time-Indexed Formulations and a Large Neighborhood Search for the Resource-Constrained Modulo Scheduling Problem , 2007 .

[9]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Gu-Yeon Wei,et al.  MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[11]  B. Ramakrishna Rau,et al.  Iterative Modulo Scheduling , 1996, International Journal of Parallel Programming.

[12]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[13]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[14]  Dilip K. Banerji,et al.  An Integrated and Accelerated ILP Solution for Scheduling, Module Allocation, and Binding in Datapath Synthesis , 1993, The Sixth International Conference on VLSI Design.

[15]  Hiroyuki Tomiyama,et al.  Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009, J. Inf. Process..

[16]  Alberto Sangiovanni-Vincentelli,et al.  Classification, Customization, and Characterization: Using MILP for Task Allocation and Scheduling , 2006 .

[17]  Oliver Sinnen,et al.  ILP Formulations for Optimal Task Scheduling with Communication Delays on Parallel Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[18]  Jason Cong,et al.  An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[19]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[20]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.