On Neural Architecture Search for Resource-Constrained Hardware Platforms

In the recent past, the success of Neural Architecture Search (NAS) has enabled researchers to broadly explore the design space using learning-based methods. Apart from finding better neural network architectures, the idea of automation has also inspired to improve their implementations on hardware. While some practices of hardware machine-learning automation have achieved remarkable performance, the traditional design concept is still followed: a network architecture is first structured with excellent test accuracy, and then compressed and optimized to fit into a target platform. Such a design flow will easily lead to inferior local-optimal solutions. To address this problem, we propose a new framework to jointly explore the space of neural architecture, hardware implementation, and quantization. Our objective is to find a quantized architecture with the highest accuracy that is implementable on given hardware specifications. We employ FPGAs to implement and test our designs with limited loop-up tables (LUTs) and required throughput. Compared to the separate design/searching methods, our framework has demonstrated much better performance under strict specifications and generated designs of higher accuracy by 18\% to 68\% in the task of classifying CIFAR10 images. With 30,000 LUTs, a light-weight design is found to achieve 82.98\% accuracy and 1293 images/second throughput, compared to which, under the same constraints, the traditional method even fails to find a valid solution.

[1]  Edwin Hsing-Mean Sha,et al.  Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[3]  Yiyu Shi,et al.  Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[5]  Jinjun Xiong,et al.  SCNN: A General Distribution based Statistical Convolutional Neural Network with Application to Video Object Detection , 2019, AAAI.

[6]  Yiyu Shi,et al.  Resource constrained cellular neural networks for real-time obstacle detection using FPGAs , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).

[7]  Jian Zhuang,et al.  Whole Heart and Great Vessel Segmentation in Congenital Heart Disease Using Deep Neural Networks and Graph Matching , 2019, MICCAI.

[8]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[10]  Yu Hu,et al.  Efficient Hardware Implementation of Cellular Neural Networks with Incremental Quantization and Early Exit , 2018, ACM J. Emerg. Technol. Comput. Syst..

[11]  Edwin Hsing-Mean Sha,et al.  Optimal functional-unit assignment and buffer placement for probabilistic pipelines , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[12]  Lei Yang,et al.  Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[13]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiaobo Sharon Hu,et al.  Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Edwin Hsing-Mean Sha,et al.  Optimal functional unit assignment and voltage selection for pipelined MPSoC with guaranteed probability on time performance , 2017, LCTES.

[16]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[17]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[18]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[19]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[20]  Yiyu Shi,et al.  Exploiting Computation Power of Blockchain for Biomedical Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[22]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jingtong Hu,et al.  When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24]  Jinjun Xiong,et al.  MSU-Net: Multiscale Statistical U-Net for Real-time 3D Cardiac MRI Video Segmentation , 2019, MICCAI.

[25]  Lei Yang,et al.  XFER: A Novel Design to Achieve Super-Linear Performance on Multiple FPGAs for Real-Time AI , 2019, FPGA.

[26]  Yiyu Shi,et al.  Hardware/Software Co-Exploration of Neural Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.