Rapid circuit-specific inlining tuning for FPGA high-level synthesis
暂无分享,去创建一个
Assumptions about the underlying architecture of the target hardware is typically what dictates the behavior of compiler optimizations. Nevertheless, modern high-level synthesis (HLS) tools that target field-programmable gate arrays (FPGAs) are still using the same optimization passes that were developed and tuned for general purpose processors. This paper examines the effect of the inlining pass on HLS-generated hardware, focusing on the circuit area and clock cycles metrics. An iterative search method to create a custom inliner tailored to each benchmark for each specific metric is proposed and evaluated. The quality of the results generated is analyzed and the effect of the coefficients used for making the inline decisions are also separately investigated. Furthermore, a novel compiler cache is proposed, enabling the rapid evaluation of new inlining logic. Results show that a circuit-specific inliner is able to generate circuits with either 6% fewer LEs, 15% fewer clock cycles or 11% smaller LEs ∗ clock cycle product when compared to LLVM's default approach. Moreover, our inliner achieved a speedup of 23x when compared to LLVM performing the same task without the compiler cache.
[1] Hiroaki Takada,et al. Regular Paper Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009 .
[2] Jason Helge Anderson,et al. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.
[3] Jason Helge Anderson,et al. The Effect of Compiler Optimizations on High-Level Synthesis-Generated Hardware , 2015, TRETS.