Optimising Resource Management for Embedded Machine Learning

Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in heterogeneous multi-core systems and show how they can be applied to optimise the performance of machine learning work-loads. Performance can be defined using platform-dependent (e.g. speed, energy) and platform-independent (accuracy, confidence) metrics. In particular, we show how a Deep Neural Network (DNN) can be dynamically scalable to trade-off these various performance metrics. Achieving consistent performance when executing on different platforms is necessary yet challenging, due to the different resources provided and their capability, and their time-varying availability when executing alongside other workloads. Managing the interface between available hardware resources (often numerous and heterogeneous in nature), software requirements, and user experience is increasingly complex.

[1]  Chenchen Liu,et al.  ReForm: Static and Dynamic Resource-Aware DNN Reconfiguration Framework for Mobile Device , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[3]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[4]  Gianluca Palermo,et al.  Application autotuning to support runtime adaptivity in multicore architectures , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Geoff V. Merrett,et al.  Inter-Cluster Thread-to-Core Mapping and DVFS on Heterogeneous Multi-Cores , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[7]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Long Tran-Thanh,et al.  Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms , 2019, 2019 ACM/IEEE 1st Workshop on Machine Learning for CAD (MLCAD).

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[11]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[12]  David B. Thomas,et al.  Heterogeneous Heartbeats: A framework for dynamic management of Autonomous SoCs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[15]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[16]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[19]  Karunakar Reddy Basireddy,et al.  Dataset for An Application- and Platform-agnostic Control and Monitoring Framework for Multicore Systems , 2018 .

[20]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[21]  Bryan Donyanavard,et al.  SOSA: Self-Optimizing Learning with Self-Adaptive Control for Hierarchical System-on-Chip Management , 2019, MICRO.

[22]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[23]  Bharadwaj Veeravalli,et al.  Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Axel Jantsch,et al.  Reliability-Aware Runtime Power Management for Many-Core Systems in the Dark Silicon Era , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[27]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).