论文信息 - FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge

FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we propose a simultaneous FPGA/DNN co-design methodology with both bottom-up and top-down approaches: a bottom-up hardware-oriented DNN model search for high accuracy, and a top-down FPGA accelerator design considering DNN-specific characteristics. We also build an automatic co-design flow, including an Auto-DNN engine to perform hardware-oriented DNN model search, as well as an Auto-HLS engine to generate synthesizable C code of the FPGA accelerator for explored DNNs. We demonstrate our co-design approach on an object detection task using PYNQ-ZI FPGA. Results show that our proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) (6.2% higher), frames per second (FPS) (2.48\times higher), power consumption (40% lower), and energy efficiency (2.5\times higher). Compared to GPU-based solutions, our designs deliver similar accuracy but consume far less energy.

[1] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Qiuwen Lou,et al. Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[4] Qin Li,et al. Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS , 2019, ASP-DAC.

[5] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[7] Jason Cong,et al. LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[9] Di He,et al. Machine learning on FPGAs to face the IoT revolution , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[11] Soheil Ghiasi,et al. Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[13] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[14] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[15] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16] Deming Chen,et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[17] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Yiyu Shi,et al. DAC-SDC Low Power Object Detection Challenge for UAV Applications , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Yun Liang,et al. High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[20] Yun Liang,et al. Design Space exploration of FPGA-based accelerators with multi-level parallelism , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.