A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption. Existing approaches typically separate the DNN model development step from its deployment on IoT devices, resulting in suboptimal solutions. In this paper, we first introduce a few interesting but counterintuitive observations about such a separate design approach, and empirically show why it may lead to suboptimal designs. Motivated by these observations, we then propose a novel and practical bi-directional co-design approach: a bottom-up DNN model design strategy together with a top-down flow for DNN accelerator design. It enables a joint optimization of both DNN models and their deployment configurations on IoT devices as represented as FPGAs. We demonstrate the effectiveness of the proposed co-design approach on a real-life object detection application using Pynq-Z1 embedded FPGA. Our method obtains the state-of-the-art results on both QoR with high accuracy (IoU) and QoS with high throughput (FPS) and high energy efficiency.

[1]  Nam Sung Kim,et al.  Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.

[2]  Viktor K. Prasanna,et al.  A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[3]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[4]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[10]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).