HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations. Novel techniques include a highly flexible and scalable architecture with a hybrid Spatial/Winograd convolution (CONV) Processing Engine (PE), a comprehensive design space exploration tool, and a complete design flow to fully support accelerator design and implementation. Experimental results show that the accelerators generated by HybridDNN can deliver 3375.7 and 83.3 GOPS on a high-end FPGA (VU9P) and an embedded FPGA (PYNQ-Z1), respectively, which achieve a 1.8x higher performance improvement compared to the state-of-art accelerator designs. This demonstrates that HybridDNN is flexible and scalable and can target both cloud and embedded hardware platforms with vastly different resource constraints.

[1]  Viktor K. Prasanna,et al.  A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[2]  Shawki Areibi,et al.  Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[3]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  Jinjun Xiong,et al.  FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[5]  Jason Cong,et al.  xPilot: A Platform-Based Behavioral Synthesis System , 2005 .

[6]  Deming Chen,et al.  High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[7]  Peng Zhang,et al.  TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[8]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[9]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yun Liang,et al.  High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[11]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Jason Cong,et al.  LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Yao Chen,et al.  Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.

[14]  Jinjun Xiong,et al.  Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs , 2018, ACM Great Lakes Symposium on VLSI.

[15]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[17]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[18]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[20]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[22]  Yue Wang,et al.  AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs , 2020, FPGA.

[23]  Jinjun Xiong,et al.  SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems , 2020, MLSys.

[24]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25]  Lei Yang,et al.  Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).