Dependence Graph Preprocessing for Faster Exact Modulo Scheduling in High-Level Synthesis

Modulo scheduling is a key throughput optimisation when compiling for VLIW architectures, which has been applied successfully to high-level synthesis (HLS) of hardware accelerators in the past. However, problem instances in the HLS context usually have larger and denser dependence graphs and may contain many simple operations that are not subject to resource constraints, causing long runtimes with VLIW-centric modulo schedulers. We propose a complexity-reduction approach for existing exact modulo schedulers that retains their ability to compute provably optimal schedules, but shortens their runtime on typical HLS instances. The basic idea is to simplify a problem instance's dependence graph by abstracting entire subgraphs of non-critical operations with a single edge, then schedule this reduced problem comprising only the critical operations. A solution obtained for the reduced problem can be easily completed to a solution for the original problem. Applied to the well-known, originally VLIW-centric, and exact ILP formulation by Eichenberger and Davidson, we show a mean speedup of 4.37x for 21 large instances, which makes it competitive again with the recently proposed, HLS-tailored Moovac formulation. As both formulations show different problem-dependent strengths and weaknesses, these insights are a first step towards an oracle that selects the most promising scheduler for a given problem instance.

[1]  Jason Cong,et al.  An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[2]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[3]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[4]  Dupont de Dinechin Time-Indexed Formulations and a Large Neighborhood Search for the Resource-Constrained Modulo Scheduling Problem , 2007 .

[5]  John O. McClain,et al.  Mathematical Programming Approaches to Capacity-Constrained MRP Systems: Review, Formulation and Problem Reduction , 1983 .

[6]  Josep Llosa,et al.  Lifetime-Sensitive Modulo Scheduling in a Production Environment , 2001, IEEE Trans. Computers.

[7]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[8]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[9]  B. Ramakrishna Rau,et al.  Iterative Modulo Scheduling , 1996, International Journal of Parallel Programming.

[10]  C. Artigues,et al.  On integer linear programming formulations for the resource-constrained modulo scheduling problem. , 2010 .

[11]  Hiroyuki Tomiyama,et al.  Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009, J. Inf. Process..

[12]  Oliver Sinnen,et al.  ILP-based modulo scheduling for high-level synthesis , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[13]  Andrea Lodi,et al.  Performance Variability in Mixed-Integer Programming , 2013 .

[14]  Jason Helge Anderson,et al.  Modulo SDC scheduling with recurrence minimization in high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[15]  Christian Artigues,et al.  The resource-constrained modulo scheduling problem: an experimental study , 2013, Comput. Optim. Appl..

[16]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[17]  Yu-Chin Hsu,et al.  A formal approach to the scheduling problem in high level synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Josep Llosa,et al.  A comparative study of modulo scheduling techniques , 2002, ICS '02.

[19]  Alexandre E. Eichenberger,et al.  Author retrospective for optimum modulo schedules for minimum register requirements , 2014, ICS 25th Anniversary.

[20]  Gu-Yeon Wei,et al.  MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Fabrizio Ferrandi,et al.  Bambu: A modular framework for the high level synthesis of memory-intensive applications , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[22]  Andreas Koch,et al.  Hardware/software co-compilation with the Nymble system , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[23]  Luca Benini,et al.  CROSS cyclic resource-constrained scheduling solver , 2014, Artif. Intell..