论文信息 - Adaptive deep learning model selection on embedded systems

Adaptive deep learning model selection on embedded systems

The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices. This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.

[1] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2] Tian Guo. Towards Efficient Deep Inference for Mobile Applications , 2017 .

[3] Aaron Klein,et al. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[4] P. Sadayappan,et al. Using machine learning to improve automatic vectorization , 2012, TACO.

[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[7] Michael F. P. O'Boyle,et al. Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[8] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[9] Eugenio Culurciello,et al. Flattened Convolutional Neural Networks for Feedforward Acceleration , 2014, ICLR.

[10] Soheil Ghiasi,et al. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.

[11] Michael F. P. O'Boyle,et al. OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[12] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[13] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[14] Song Han,et al. Deep compression and EIE: Efficient inference engine on compressed deep neural network , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[15] B. S. Manjunath,et al. Are Very Deep Neural Networks Feasible on Mobile Devices , 2016 .

[16] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[17] Soheil Ghiasi,et al. Machine Intelligence on Resource-Constrained IoT Devices , 2017, ACM Trans. Embed. Comput. Syst..

[18] Yang Hu,et al. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[20] Zheng Wang,et al. Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[21] Gordon S. Blair,et al. Daleel: Simplifying cloud instance selection using machine learning , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[22] Pavlos Petoumenos,et al. Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[23] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[24] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[26] Christopher D. Manning,et al. Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[27] Sujith Ravi,et al. ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections , 2017, ArXiv.

[28] Zheng Wang,et al. Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.

[29] Cecilia Mascolo,et al. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[30] Nicholas D. Lane,et al. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[31] Zheng Wang,et al. Adaptive optimization for OpenCL programs on embedded heterogeneous systems , 2017, LCTES.

[32] Michael F. P. O'Boyle,et al. Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.

[33] Hammam A. Alshazly,et al. Image Features Detection, Description and Matching , 2016 .

[34] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[35] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[37] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[38] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[39] Michael F. P. O'Boyle,et al. A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.

[40] Ling Gao,et al. Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[41] Peng Zhang,et al. Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[42] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[43] Michael F. P. O'Boyle,et al. Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[44] Hamed Haddadi,et al. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics , 2017, IEEE Internet of Things Journal.

[45] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[46] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47] Michael F. P. O'Boyle,et al. Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[48] Nicholas D. Lane,et al. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[49] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[50] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[51] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[52] Michael F. P. O'Boyle,et al. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems , 2014, ACM Trans. Archit. Code Optim..

[53] Rajesh Krishna Balan,et al. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[54] Sandra Servia Rodríguez,et al. Personal Model Training under Privacy Constraints , 2017, ArXiv.

[55] H. T. Kung,et al. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[56] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[57] Michael F. P. O'Boyle,et al. Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[58] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[59] Tajana Simunic,et al. ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning , 2016, ISLPED.

[60] Michael F. P. O'Boyle,et al. Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[61] Michael F. P. O'Boyle,et al. Using machine learning to partition streaming programs , 2013, ACM Trans. Archit. Code Optim..

[62] Barry Porter,et al. Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach , 2017 .

[63] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Zheng Wang,et al. Fast Automatic Heuristic Construction Using Active Learning , 2014, LCPC.