Rapid circuit-specific inlining tuning for FPGA high-level synthesis

Assumptions about the underlying architecture of the target hardware is typically what dictates the behavior of compiler optimizations. Nevertheless, modern high-level synthesis (HLS) tools that target field-programmable gate arrays (FPGAs) are still using the same optimization passes that were developed and tuned for general purpose processors. This paper examines the effect of the inlining pass on HLS-generated hardware, focusing on the circuit area and clock cycles metrics. An iterative search method to create a custom inliner tailored to each benchmark for each specific metric is proposed and evaluated. The quality of the results generated is analyzed and the effect of the coefficients used for making the inline decisions are also separately investigated. Furthermore, a novel compiler cache is proposed, enabling the rapid evaluation of new inlining logic. Results show that a circuit-specific inliner is able to generate circuits with either 6% fewer LEs, 15% fewer clock cycles or 11% smaller LEs ∗ clock cycle product when compared to LLVM's default approach. Moreover, our inliner achieved a speedup of 23x when compared to LLVM performing the same task without the compiler cache.