RHNAS: Realizable Hardware and Neural Architecture Search

The rapidly evolving field of Artificial Intelligence necessitates automated approaches to co-design neural network architecture and neural accelerators to maximize system efficiency and address productivity challenges. To enable joint optimization of this vast space, there has been growing interest in differentiable NN-HW co-design. Fully differentiable co-design has reduced the resource requirements for discovering optimized NN-HW configurations, but fail to adapt to general hardware accelerator search spaces. This is due to the existence of nonsynthesizable (invalid) designs in the search space of many hardware accelerators. To enable efficient and realizable co-design of configurable hardware accelerators with arbitrary neural network search spaces, we introduce RHNAS. RHNAS is a method that combines reinforcement learning for hardware optimization with differentiable neural architecture search. RHNAS discovers realizable NN-HW designs with 1.84× lower latency and 1.86× lower energydelay product (EDP) on ImageNet and 2.81× lower latency and 3.30× lower EDP on CIFAR-10 over the default hardware accelerator design.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Payal Dhar,et al.  The carbon impact of artificial intelligence , 2020, Nature Machine Intelligence.

[3]  Jingtong Hu,et al.  On Neural Architecture Search for Resource-Constrained Hardware Platforms , 2019, ArXiv.

[4]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[6]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[7]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[8]  Stephen Richardson,et al.  Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era , 2016, IEEE Design & Test.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Yujun Lin,et al.  Neural-Hardware Architecture Search , 2019 .

[11]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[12]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[13]  Sheng-Chun Kao,et al.  GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[14]  Chunxiao Liu,et al.  DSNAS: Direct Neural Architecture Search Without Parameter Retraining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thierry Moreau,et al.  A Hardware–Software Blueprint for Flexible Deep Learning Specialization , 2018, IEEE Micro.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Yunxin Liu,et al.  Fast Hardware-Aware Neural Architecture Search , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Joonsang Yu,et al.  DANCE: Differentiable Accelerator/Network Co-Exploration , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[21]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[22]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Sheng-Chun Kao,et al.  ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[26]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[27]  Meng Li,et al.  Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[30]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[31]  Yash Akhauri,et al.  LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications , 2020, 2020 30th International Conference on Field-Programmable Logic and Applications (FPL).

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[34]  Nicholas D. Lane,et al.  Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[35]  Lei Yang,et al.  Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[36]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[37]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Xiaopeng Zhang,et al.  Latency-Aware Differentiable Neural Architecture Search , 2020, ArXiv.

[40]  Chaojian Li,et al.  DNA: Differentiable Network-Accelerator Co-Search , 2020, ArXiv.