论文信息 - Enabling hard constraints in differentiable neural network and accelerator co-exploration

Enabling hard constraints in differentiable neural network and accelerator co-exploration

Co-exploration of an optimal neural architecture and its hardware accelerator is an approach of rising interest which addresses the computational cost problem, especially in low-profile systems. The large co-exploration space is often handled by adopting the idea of differentiable neural architecture search. However, despite the superior search efficiency of the differentiable co-exploration, it faces a critical challenge of not being able to systematically satisfy hard constraints such as frame rate. To handle the hard constraint problem of differentiable co-exploration, we propose HDX, which searches for hard-constrained solutions without compromising the global design objectives. By manipulating the gradients in the interest of the given hard constraint, high-quality solutions satisfying the constraint can be obtained.

[1] Yingyan Lin,et al. Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators , 2021, ICML.

[2] Song Han,et al. NAAS: Neural Accelerator Architecture Search , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[3] Kaushik Roy,et al. Gradient Projection Memory for Continual Learning , 2021, ICLR.

[4] Lihi Zelnik-Manor,et al. HardCoRe-NAS: Hard Constrained diffeRentiable Neural Architecture Search , 2021, ICML.

[5] Noseong Park,et al. DPM: A Novel Training Method for Physics-Informed Neural Networks in Extrapolation , 2020, AAAI.

[6] Joonsang Yu,et al. DANCE: Differentiable Accelerator/Network Co-Exploration , 2020, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[7] Yibo Hu,et al. TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search , 2020, ECCV.

[8] Matthew B. Blaschko,et al. AOWS: Adaptive and Optimal Network Width Search With Latency Constraints , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jinjun Xiong,et al. EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[10] Vivek Sarkar,et al. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings , 2020, IEEE Micro.

[11] Suyog Gupta,et al. Accelerator-aware Neural Network Design using AutoML , 2020, ArXiv.

[12] Thomas C. P. Chau,et al. Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[13] Meng Li,et al. Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[14] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15] Jingtong Hu,et al. On Neural Architecture Search for Resource-Constrained Hardware Platforms , 2019, ArXiv.

[16] Chuang Gan,et al. Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[17] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jinjun Xiong,et al. FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[19] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20] Lei Yang,et al. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[21] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[22] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[23] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[25] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[26] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[28] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[29] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[32] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .