Optimizing FPGA-Based CNN Accelerator Using Differentiable Neural Architecture Search

Neural architecture search (NAS) aims to find the optimal neural network automatically for different scenarios. Among various NAS methods, the differentiable NAS (DNAS) approach has demonstrated its effectiveness in terms of searching cost and final accuracy. However, most of previous efforts focus on applying DNAS to GPU or CPU platforms, and its potential is less exploited on the FPGA. In this paper, we first propose a novel FPGA-based CNN accelerator. An accurate performance model of the proposed hardware design is also introduced. To improve accuracy as well as hardware performance, we then apply DNAS and encapsulate the proposed performance model into the objective function. Based on our FPGA design and NAS method, the experiments demonstrate that the network generated by NAS achieves nearly 95% accuracy on CIFAR-10, while decreasing latency by nearly 12 times compared with existing work.

[1]  Ahmed Baruwa,et al.  Leveraging End-to-End Speech Recognition with Neural Architecture Search , 2019, ArXiv.

[2]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[4]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[5]  Yiyu Shi,et al.  Hardware/Software Co-Exploration of Neural Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[7]  Wayne Luk,et al.  A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[10]  Kaiming He,et al.  Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wayne Luk,et al.  Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[13]  Hongxiang Fan,et al.  Static Block Floating-Point Quantization for Convolutional Neural Networks on FPGA , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).

[14]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wayne Luk,et al.  F-E3D: FPGA-based Acceleration of an Efficient 3D Convolutional Neural Network for Human Action Recognition , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.