Run fast when you can: Loop pipelining with uncertain and non-uniform memory dependencies

As a key optimisation method in high-level synthesis (HLS), high-performance loop pipelining is enabled by the static scheduling algorithm. When there are non-trivial memory dependencies in the loop, current HLS tools have to apply conservative pipeline schedule that also leads to nearly sequential execution. In this paper, we demonstrate using parametric polyhedral model to mathematically capture uncertain (i.e., parameterised by an undetermined variable) and/or non-uniform (i.e., varying between loop iterations) memory dependence patterns. According to this static analysis, if we always execute the loop with an aggressive (fast) pipeline schedule, we can generate the parameter conditions in which this execution is safe and the parametric break points when the execution encounters memory conflicts. Then, we apply these information into an automated source-to-source code transformation, which implements parametric loop pipelining and loop splitting. The transformed loop is synthesised by Vivado HLS and its execution speed can be adjusted at runtime to avoid memory conflicts. The experiments over a set of benchmark loops show that our optimisation can improve the runtime pipeline performance significantly with a reasonable overhead of hardware resources.

[1]  Sven Verdoolaege,et al.  Polyhedral Extraction Tool , 2012 .

[2]  Patrice Quinton,et al.  The mapping of linear recurrence equations on regular arrays , 1989, J. VLSI Signal Process..

[3]  Steven Derrien,et al.  Runtime dependency analysis for loop pipelining in High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  George A. Constantinides,et al.  Offline Synthesis of Online Dependence Testing: Parametric Loop Pipelining for HLS , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  Jason Helge Anderson,et al.  Modulo SDC scheduling with recurrence minimization in high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[7]  Jason Cong,et al.  Throughput Optimization for High-Level Synthesis Using Resource Constraints , 2014 .

[8]  Zhiru Zhang,et al.  SDC-based modulo scheduling for pipeline synthesis , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9]  George A. Constantinides,et al.  Loop Splitting for Efficient Pipelining in High-Level Synthesis , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[10]  Zhiru Zhang,et al.  ElasticFlow: A complexity-effective approach for pipelining irregular loop nests , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[11]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[12]  Patrice Quinton,et al.  Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Shreesha Srinath,et al.  Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis , 2017, FPGA.

[14]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.