Poster: Scaling Up Deep Neural Network optimization for Edge Inference†

Deep neural networks (DNNs) have been increasingly deployed on and integrated with edge devices, such as mobile phones, drones, robots and wearables. To run DNN inference directly on edge devices (a.k.a. edge inference) with a satisfactory performance, optimizing the DNN design (e.g., network architecture and quantization policy) is crucial. While state-of-the-art DNN designs have leveraged performance predictors to speed up the optimization process, they are device-specific (i.e., each predictor for only one target device) and hence cannot scale well in the presence of extremely diverse edge devices. Moreover, even with performance predictors, the optimizer (e.g., search-based optimization) can still be time-consuming when optimizing DNNs for many different devices. In this work, we propose two approaches to scaling up DNN optimization. In the first approach, we reuse the performance predictors built on a proxy device, and leverage the performance monotonicity to scale up the DNN optimization without re-building performance predictors for each different device. In the second approach, we build scalable performance predictors that can estimate the resulting performance (e.g., inference accuracy/latency/energy) given a DNN-device pair, and use a neural network-based automated optimizer that takes both device features and optimization parameters as input and then directly outputs the optimal DNN design without going through a lengthy optimization process for each individual device.

[1]  Jingtong Hu,et al.  On Neural Architecture Search for Resource-Constrained Hardware Platforms , 2019, ArXiv.

[2]  Yu Wang,et al.  A Surgery of the Neural Architecture Evaluators , 2020 .

[3]  Manuel L'opez Galv'an The multivariate bisection algorithm , 2017, Revista de la Unión Matemática Argentina.

[4]  James T. Kwok,et al.  Efficient Sample-based Neural Architecture Search with Learnable Predictor , 2019 .

[5]  Margret Keuper,et al.  NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search , 2020, ArXiv.

[6]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[8]  Ming Zhao,et al.  Are Existing Knowledge Transfer Techniques Effective for Deep Learning with Edge Devices? , 2018, 2018 IEEE International Conference on Edge Computing (EDGE).

[9]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Diana Marculescu,et al.  HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Jingtong Hu,et al.  Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search With Hot Start , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Yukun Yang,et al.  MSNet: Structural Wired Neural Architecture Search for Internet of Things , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[13]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[15]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[16]  Cong Shen,et al.  Towards Optimal Power Control via Ensembling Deep Neural Networks , 2018, IEEE Transactions on Communications.

[17]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[18]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[19]  Song Han,et al.  HAT: Hardware-Aware Transformers for Efficient Natural Language Processing , 2020, ACL.

[20]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[21]  Diana Marculescu,et al.  NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks , 2017, ArXiv.

[22]  Yu Chen,et al.  Smart Surveillance as an Edge Network Service: From Harr-Cascade, SVM to a Lightweight CNN , 2018, 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC).

[23]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Xiaowen Dong,et al.  Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel , 2020, ArXiv.

[26]  Yanzhi Wang,et al.  PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[27]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[28]  Sheng-Chun Kao,et al.  Generative Design of Hardware-aware DNNs , 2020, ArXiv.

[29]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[30]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.

[31]  Yiyang Zhao,et al.  AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.

[32]  Song Han,et al.  APQ: Joint Search for Network Architecture, Pruning and Quantization Policy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[34]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[35]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[36]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  James T. Kwok,et al.  Multi-objective Neural Architecture Search via Predictive Network Performance Optimization , 2019, ArXiv.

[38]  Eric Balkanski,et al.  The Power of Optimization from Samples , 2016, NIPS.

[39]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[40]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[41]  H. T. Kung,et al.  Embedded Binarized Neural Networks , 2017, EWSN.

[42]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[43]  Eric Balkanski,et al.  The limitations of optimization from samples , 2015, STOC.