Hardware-Aware Machine Learning: Modeling and Optimization

Recent breakthroughs in Machine Learning (ML) applications, and especially in Deep Learning (DL), have made DL models a key component in almost every modern computing system. The increased popularity of DL applications deployed on a wide-spectrum of platforms (from mobile devices to datacenters) have resulted in a plethora of design challenges related to the constraints introduced by the hardware itself. “What is the latency or energy cost for an inference made by a Deep Neural Network (DNN)?” “Is it possible to predict this latency or energy consumption before a model is even trained?” “If yes, how can machine learners take advantage of these models to design the hardware-optimal DNN for deployment?” From lengthening battery life of mobile devices to reducing the runtime requirements of DL models executing in the cloud, the answers to these questions have drawn significant attention. One cannot optimize what isn't properly modeled. Therefore, it is important to understand the hardware efficiency of DL models during serving for making an inference, before even training the model. This key observation has motivated the use of predictive models to capture the hardware performance or energy efficiency of ML applications. Furthermore, ML practitioners are currently challenged with the task of designing the DNN model, i.e., of tuning the hyper-parameters of the DNN architecture, while optimizing for both accuracy of the DL model and its hardware efficiency. Therefore, state-of-the-art methodologies have proposed hardware-aware hyper-parameter optimization techniques. In this paper, we provide a comprehensive assessment of state-of-the-art work and selected results on the hardware-aware modeling and optimization for ML applications. We also highlight several open questions that are poised to give rise to novel hardware-aware designs in the next few years, as DL applications continue to significantly impact associated hardware systems and platforms.

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Min Sun,et al.  DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures , 2018, ECCV.

[3]  Sofie Pollin,et al.  Distributed Deep Learning Models for Wireless Signal Classification with Low-Cost Spectrum Sensors , 2017, ArXiv.

[4]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Diana Marculescu,et al.  HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[9]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Diana Marculescu,et al.  Profit: Priority and Power/Performance Optimization for Many-Core Systems , 2018, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[12]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[16]  Gu-Yeon Wei,et al.  A case for efficient accelerator design space exploration via Bayesian optimization , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[17]  Ameet Talwalkar,et al.  Paleo: A Performance Model for Deep Neural Networks , 2016, ICLR.

[18]  Matthew W. Hoffman,et al.  A General Framework for Constrained Bayesian Optimization using Information-based Search , 2015, J. Mach. Learn. Res..

[19]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[20]  Diana Marculescu,et al.  Quantized deep neural networks for energy efficient hardware-based inference , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[21]  Diana Marculescu,et al.  Can We Guarantee Performance Requirements under Workload and Process Variations? , 2016, ISLPED.

[22]  Diana Marculescu,et al.  Exploring aging deceleration in FinFET-based multi-core systems , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[24]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[25]  Diana Marculescu,et al.  LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks , 2017, ACM Great Lakes Symposium on VLSI.

[26]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[28]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29]  Diana Marculescu,et al.  NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks , 2017, ArXiv.

[30]  Diana Marculescu,et al.  Designing Adaptive Neural Networks for Energy-Constrained Image Classification , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[31]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[34]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[35]  Niraj K. Jha,et al.  NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[36]  José Miguel Hernández-Lobato Designing Neural Network Hardware Accelerators with Decoupled Objective Evaluations , 2016 .

[37]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.