FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we propose a simultaneous FPGA/DNN co-design methodology with both bottom-up and top-down approaches: a bottom-up hardware-oriented DNN model search for high accuracy, and a top-down FPGA accelerator design considering DNN-specific characteristics. We also build an automatic co-design flow, including an Auto-DNN engine to perform hardware-oriented DNN model search, as well as an Auto-HLS engine to generate synthesizable C code of the FPGA accelerator for explored DNNs. We demonstrate our co-design approach on an object detection task using PYNQ-ZI FPGA. Results show that our proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) (6.2% higher), frames per second (FPS) (2.48\times higher), power consumption (40% lower), and energy efficiency (2.5\times higher). Compared to GPU-based solutions, our designs deliver similar accuracy but consume far less energy.

[1]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Qin Li,et al.  Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS , 2019, ASP-DAC.

[5]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[7]  Jason Cong,et al.  LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[9]  Di He,et al.  Machine learning on FPGAs to face the IoT revolution , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[11]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[13]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[14]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[15]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Deming Chen,et al.  High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[17]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yiyu Shi,et al.  DAC-SDC Low Power Object Detection Challenge for UAV Applications , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yun Liang,et al.  High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[20]  Yun Liang,et al.  Design Space exploration of FPGA-based accelerators with multi-level parallelism , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.