Hi-ClockFlow: Multi-Clock Dataflow Automation and Throughput Optimization in High-Level Synthesis

Tools of high-level synthesis (HLS) are developed to improve the accessibility of FPGAs by allowing designer to describe hardware designs in high-level language, e.g. C/C++. However, the source codes of general applications are not structured as canonical dataflow. Furthermore, clock frequencies are powerful parameters to improve dataflow throughput but currently commercial HLS tools limit themselves to single clock domain. Consequently, in order to benefit from the multiple-clock dataflow design, designers still suffer from manually analyzing the applications, partitioning the source code into modules, optimizing them with appropriate parameters and resource allocation, and finally interconnecting them. We analyze the impact of multiple clock domains for HLS designs and present Hi-ClockFlow, an automatic HLS framework. Hi-ClockFlow can analyze the source code based on Light-HLS, our light weight HLS evaluation framework, explore the large design space, and optimize such parameters as clock frequencies and HLS directives in dataflow. By properly partitioning the source code of an application into parts with various clock domains, Hi-ClockFlow can optimize the dataflow with imbalanced modules and speed up the performance under the specific constraint of resource.

[1]  Jason Helge Anderson,et al.  High-Level Synthesis of FPGA Circuits with Multiple Clock Domains , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Ranga Vemuri,et al.  The GAPLA: a globally asynchronous locally synchronous FPGA architecture , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[3]  Mamaghani Mahdi Jelodari,et al.  Automatic Clock: A Promising Approach toward GALSification , 2016 .

[4]  Viktor K. Prasanna,et al.  A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[5]  Doug A. Edwards,et al.  De-elastisation: From asynchronous dataflows to synchronous circuits , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Mamaghani Mahdi Jelodari,et al.  Asynchronous Dataflow De-Elastisation for Efficient Heterogeneous Synthesis , 2016 .

[7]  Jason Cong,et al.  Throughput Optimization for High-Level Synthesis Using Resource Constraints , 2014 .

[8]  Peng Zhang Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[9]  Jason Cong,et al.  Resource-Aware Throughput Optimization for High-Level Synthesis , 2015, FPGA.

[10]  Peter Y. K. Cheung,et al.  Globally Asynchronous Locally Synchronous FPGA Architectures , 2003, FPL.

[11]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[12]  Bingsheng He,et al.  Performance Modeling and Directives Optimization for High-Level Synthesis on FPGA , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.