Multi-cycling is a well-known strategy to improve performance in digital design, wherein the required time for selected combinational paths is lengthened to multiple clock cycles (rather than just one). The approach can be applied to paths associated with computations whose results are not needed immediately - such paths are allowed multiple clock cycles to “complete”, reducing the opportunity for them to form the critical path of the circuit. In this paper, we consider multi-cycling in the high-level synthesis context (HLS) and use software profiling to guide multi-cycling optimizations. Specifically, prior to HLS, we execute the program in software with typical datasets to gather data on the number of times each code segment executes. During HLS, we then extend the schedule for infrequently executed code segments and apply multi-cycling to the dilated schedules, which exhibit greater opportunities for multi-cycling. In essence, our approach ensures that non-frequently executed code segments will not form the critical path of the HLS-generated circuit. In an experimental study targeting the Altera Stratix IV FPGA, we evaluate the impact on speed performance and area for both traditional multi-cycling, as well as the proposed software profiling-driven multi-cycling, and show that profiling-driven multi-cycling leads to an average speedup of over 10% across 13 benchmark circuits, with some circuit speedups in excess of 30%. Circuit area is reduced by 11%, yielding a mean 20% improvement in area-delay product.
[1]
Jason Helge Anderson,et al.
LegUp: high-level synthesis for FPGA-based processor/accelerator systems
,
2011,
FPGA '11.
[2]
Deming Chen,et al.
Fast and effective placement and routing directed high-level synthesis for FPGAs
,
2014,
FPGA.
[3]
Rakesh Chadha,et al.
Static Timing Analysis for Nanometer Designs: A Practical Approach
,
2009
.
[4]
Hiroyuki Tomiyama,et al.
Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis
,
2009,
J. Inf. Process..
[5]
Deming Chen,et al.
High-level synthesis with behavioral level multi-cycle path analysis
,
2013,
2013 23rd International Conference on Field programmable Logic and Applications.
[6]
Jason Cong,et al.
An efficient and versatile scheduling algorithm based on SDC formulation
,
2006,
2006 43rd ACM/IEEE Design Automation Conference.