论文信息 - Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA

Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA

Convolutional Neural Networks are compute-intensive learning models that have demonstrated ability and effectiveness in solving complex learning problems. However, developing a high-performance FPGA accelerator for CNN often demands high programming skills, hardware verification, precise distribution localization, and long development cycles. Besides, CNN depth increases by reuse and replication of multiple layers. This paper proposes a programming flow for CNN on FPGA to generate high-performance accelerators by assembling CNN pre-implemented components as a puzzle based on the graph topology. Using pre-implemented components allows us to use the minimum of resources necessary, predict the performance, and gain in productivity since there is no need to synthesize any HDL code. Furthermore, components can be reused for a different range of applications. Through prototyping, we demonstrated the viability and relevance of our approach. Experiments show a productivity improvement of up to 69% compared to a traditional FPGA implementation while achieving over 1.75× higher Fmax with lower resources and power consumption.

Christophe Bobda | Joel Mandebi Mbongue | Danielle Tchuinkou Kwadjo

[1] Yu Wang,et al. DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5] Jason Cong,et al. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[6] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[7] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[8] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9] Geraldo Robson Mateus,et al. A performance guarantee heuristic for electronic components placement problems including thermal effects , 2005, Comput. Oper. Res..

[10] David Andrews,et al. Just In Time Assembly of Accelerators , 2016, FPGA.

[11] Yu Cao,et al. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[12] H. T. Kung,et al. Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation , 2019, ICS.

[13] Sparsh Mittal,et al. A survey of FPGA-based accelerators for convolutional neural networks , 2018, Neural Computing and Applications.

[14] Farinaz Koushanfar,et al. FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15] Chris Lavin,et al. RapidWright: Enabling Custom Crafted Implementations for FPGAs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[16] Christophe Bobda,et al. Late Breaking Results: Automated Hardware Generation of CNN Models on FPGAs , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[17] E. L. Hepler,et al. Hierarchical design , 1991 .

[18] Christophe Bobda,et al. Automatic Generation of Application-Specific FPGA Overlays with RapidWright , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).