论文信息 - On Neural Architecture Search for Resource-Constrained Hardware Platforms

On Neural Architecture Search for Resource-Constrained Hardware Platforms

In the recent past, the success of Neural Architecture Search (NAS) has enabled researchers to broadly explore the design space using learning-based methods. Apart from finding better neural network architectures, the idea of automation has also inspired to improve their implementations on hardware. While some practices of hardware machine-learning automation have achieved remarkable performance, the traditional design concept is still followed: a network architecture is first structured with excellent test accuracy, and then compressed and optimized to fit into a target platform. Such a design flow will easily lead to inferior local-optimal solutions. To address this problem, we propose a new framework to jointly explore the space of neural architecture, hardware implementation, and quantization. Our objective is to find a quantized architecture with the highest accuracy that is implementable on given hardware specifications. We employ FPGAs to implement and test our designs with limited loop-up tables (LUTs) and required throughput. Compared to the separate design/searching methods, our framework has demonstrated much better performance under strict specifications and generated designs of higher accuracy by 18\% to 68\% in the task of classifying CIFAR10 images. With 30,000 LUTs, a light-weight design is found to achieve 82.98\% accuracy and 1293 images/second throughput, compared to which, under the same constraints, the traditional method even fails to find a valid solution.

[1] Edwin Hsing-Mean Sha,et al. Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2] Jason Cong,et al. Scaling for edge inference of deep neural networks , 2018 .

[3] Yiyu Shi,et al. Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[5] Jinjun Xiong,et al. SCNN: A General Distribution based Statistical Convolutional Neural Network with Application to Video Object Detection , 2019, AAAI.

[6] Yiyu Shi,et al. Resource constrained cellular neural networks for real-time obstacle detection using FPGAs , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).

[7] Jian Zhuang,et al. Whole Heart and Great Vessel Segmentation in Congenital Heart Disease Using Deep Neural Networks and Graph Matching , 2019, MICCAI.

[8] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[10] Yu Hu,et al. Efficient Hardware Implementation of Cellular Neural Networks with Incremental Quantization and Early Exit , 2018, ACM J. Emerg. Technol. Comput. Syst..

[11] Edwin Hsing-Mean Sha,et al. Optimal functional-unit assignment and buffer placement for probabilistic pipelines , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[12] Lei Yang,et al. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[13] Yuandong Tian,et al. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Xiaobo Sharon Hu,et al. Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Edwin Hsing-Mean Sha,et al. Optimal functional unit assignment and voltage selection for pipelined MPSoC with guaranteed probability on time performance , 2017, LCTES.

[16] Kunle Olukotun,et al. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[17] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[18] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[19] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[20] Yiyu Shi,et al. Exploiting Computation Power of Blockchain for Biomedical Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[22] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Jingtong Hu,et al. When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24] Jinjun Xiong,et al. MSU-Net: Multiscale Statistical U-Net for Real-time 3D Cardiac MRI Video Segmentation , 2019, MICCAI.

[25] Lei Yang,et al. XFER: A Novel Design to Achieve Super-Linear Performance on Multiple FPGAs for Real-Time AI , 2019, FPGA.

[26] Yiyu Shi,et al. Hardware/Software Co-Exploration of Neural Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.