论文信息 - HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations. Novel techniques include a highly flexible and scalable architecture with a hybrid Spatial/Winograd convolution (CONV) Processing Engine (PE), a comprehensive design space exploration tool, and a complete design flow to fully support accelerator design and implementation. Experimental results show that the accelerators generated by HybridDNN can deliver 3375.7 and 83.3 GOPS on a high-end FPGA (VU9P) and an embedded FPGA (PYNQ-Z1), respectively, which achieve a 1.8x higher performance improvement compared to the state-of-art accelerator designs. This demonstrates that HybridDNN is flexible and scalable and can target both cloud and embedded hardware platforms with vastly different resource constraints.

[1] Viktor K. Prasanna,et al. A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[2] Shawki Areibi,et al. Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[3] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[4] Jinjun Xiong,et al. FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[5] Jason Cong,et al. xPilot: A Platform-Based Behavioral Synthesis System , 2005 .

[6] Deming Chen,et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[7] Peng Zhang,et al. TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[8] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[9] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yun Liang,et al. High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[11] Qiuwen Lou,et al. Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[12] Jason Cong,et al. LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13] Yao Chen,et al. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.

[14] Jinjun Xiong,et al. Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs , 2018, ACM Great Lakes Symposium on VLSI.

[15] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[17] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[18] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[20] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[21] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[22] Yue Wang,et al. AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs , 2020, FPGA.

[23] Jinjun Xiong,et al. SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems , 2020, MLSys.

[24] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25] Lei Yang,et al. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).