Adaptive deep learning model selection on embedded systems

The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices. This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Tian Guo Towards Efficient Deep Inference for Mobile Applications , 2017 .

[3]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[4]  P. Sadayappan,et al.  Using machine learning to improve automatic vectorization , 2012, TACO.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Michael F. P. O'Boyle,et al.  Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[7]  Michael F. P. O'Boyle,et al.  Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[8]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[9]  Eugenio Culurciello,et al.  Flattened Convolutional Neural Networks for Feedforward Acceleration , 2014, ICLR.

[10]  Soheil Ghiasi,et al.  CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.

[11]  Michael F. P. O'Boyle,et al.  OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[12]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[13]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[14]  Song Han,et al.  Deep compression and EIE: Efficient inference engine on compressed deep neural network , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[15]  B. S. Manjunath,et al.  Are Very Deep Neural Networks Feasible on Mobile Devices , 2016 .

[16]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[17]  Soheil Ghiasi,et al.  Machine Intelligence on Resource-Constrained IoT Devices , 2017, ACM Trans. Embed. Comput. Syst..

[18]  Yang Hu,et al.  Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[20]  Zheng Wang,et al.  Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[21]  Gordon S. Blair,et al.  Daleel: Simplifying cloud instance selection using machine learning , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[22]  Pavlos Petoumenos,et al.  Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[23]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[26]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[27]  Sujith Ravi,et al.  ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections , 2017, ArXiv.

[28]  Zheng Wang,et al.  Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.

[29]  Cecilia Mascolo,et al.  Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[30]  Nicholas D. Lane,et al.  Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[31]  Zheng Wang,et al.  Adaptive optimization for OpenCL programs on embedded heterogeneous systems , 2017, LCTES.

[32]  Michael F. P. O'Boyle,et al.  Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.

[33]  Hammam A. Alshazly,et al.  Image Features Detection, Description and Matching , 2016 .

[34]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[35]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[36]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[37]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[38]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[39]  Michael F. P. O'Boyle,et al.  A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.

[40]  Ling Gao,et al.  Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[41]  Peng Zhang,et al.  Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[42]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[43]  Michael F. P. O'Boyle,et al.  Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[44]  Hamed Haddadi,et al.  A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics , 2017, IEEE Internet of Things Journal.

[45]  Chris Cummins,et al.  End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[48]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[49]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[50]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[51]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[52]  Michael F. P. O'Boyle,et al.  Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems , 2014, ACM Trans. Archit. Code Optim..

[53]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[54]  Sandra Servia Rodríguez,et al.  Personal Model Training under Privacy Constraints , 2017, ArXiv.

[55]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[56]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[57]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[58]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[59]  Tajana Simunic,et al.  ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning , 2016, ISLPED.

[60]  Michael F. P. O'Boyle,et al.  Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[61]  Michael F. P. O'Boyle,et al.  Using machine learning to partition streaming programs , 2013, ACM Trans. Archit. Code Optim..

[62]  Barry Porter,et al.  Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach , 2017 .

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Zheng Wang,et al.  Fast Automatic Heuristic Construction Using Active Learning , 2014, LCPC.