Exact and Practical Modulo Scheduling for High-Level Synthesis

Loop pipelining is an essential technique in high-level synthesis to increase the throughput and resource utilisation of field-programmable gate array--based accelerators. It relies on modulo schedulers to compute an operator schedule that allows subsequent loop iterations to overlap partially when executed while still honouring all precedence and resource constraints. Modulo schedulers face a bi-criteria problem: minimise the initiation interval (II; i.e., the number of timesteps after which new iterations are started) and minimise the schedule length. We present Moovac, a novel exact formulation that models all aspects (including the II minimisation) of the modulo scheduling problem as a single integer linear program, and discuss simple measures to prevent excessive runtimes, to challenge the old preconception that exact modulo scheduling is impractical. We substantiate this claim by conducting an experimental study covering 188 loops from two established high-level synthesis benchmark suites, four different time limits, and three bounds for the schedule length, to compare our approach against a highly tuned exact formulation and a state-of-the-art heuristic algorithm. In the fastest configuration, an accumulated runtime of under 16 minutes is spent on scheduling all loops, and proven optimal IIs are found for 179 test instances.

[1]  Jason Helge Anderson,et al.  Modulo SDC scheduling with recurrence minimization in high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[2]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[3]  Oliver Sinnen,et al.  ILP-based modulo scheduling for high-level synthesis , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[4]  Jan Müller,et al.  Optimal Software Pipelining with Rational Initiation Interval , 2002, PDPTA.

[5]  Premysl Sucha,et al.  A cyclic scheduling problem with an undetermined number of parallel identical processors , 2011, Comput. Optim. Appl..

[6]  Alexandre E. Eichenberger,et al.  Author retrospective for optimum modulo schedules for minimum register requirements , 2014, ICS 25th Anniversary.

[7]  Andreas Koch,et al.  An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  Guang R. Gao,et al.  Optimal Modulo Scheduling Through Enumeration , 2004, International Journal of Parallel Programming.

[9]  Alexandre E. Eichenberger,et al.  Optimum modulo schedules for minimum register requirements , 1995, ICS '95.

[10]  Benoit Dupont De Dinechin,et al.  Simplex Scheduling: More than Lifetime-Sensitive Instruction Scheduling , 1994 .

[11]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[12]  Josep Llosa,et al.  A comparative study of modulo scheduling techniques , 2002, ICS '02.

[13]  Oliver Sinnen,et al.  ILP Formulations for Optimal Task Scheduling with Communication Delays on Parallel Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[14]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[15]  Alexandra M. Newman,et al.  Practical Guidelines for Solving Difficult Mixed Integer Linear , 2013 .

[16]  Hiroyuki Tomiyama,et al.  Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009, J. Inf. Process..

[17]  Zhiru Zhang,et al.  SDC-based modulo scheduling for pipeline synthesis , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Gu-Yeon Wei,et al.  MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Andreas Koch,et al.  Hardware/software co-compilation with the Nymble system , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[20]  Guang R. Gao,et al.  Scheduling and mapping: software pipelining in the presence of structural hazards , 1995, PLDI '95.

[21]  Luca Benini,et al.  CROSS cyclic resource-constrained scheduling solver , 2014, Artif. Intell..

[22]  Dupont de Dinechin Time-Indexed Formulations and a Large Neighborhood Search for the Resource-Constrained Modulo Scheduling Problem , 2007 .

[23]  B. Ramakrishna Rau,et al.  Iterative Modulo Scheduling , 1996, International Journal of Parallel Programming.

[24]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[25]  C. Artigues,et al.  On integer linear programming formulations for the resource-constrained modulo scheduling problem. , 2010 .

[26]  Josep Llosa,et al.  Lifetime-Sensitive Modulo Scheduling in a Production Environment , 2001, IEEE Trans. Computers.

[27]  Jason Cong,et al.  An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.