Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device

Training deep learning models on mobile devices recently becomes possible, because of increasing computation power on mobile hardware and the advantages of enhancing user experiences. Most of the existing work on machine learning at mobile devices is focused on the inference of deep learning models, but not training. The performance characterization of training deep learning models on mobile devices is largely unexplored, although understanding the performance characterization is critical for designing and implementing deep learning models on mobile devices. In this paper, we perform a variety of experiments on a representative mobile device (the NVIDIA TX2) to study the performance of training deep learning models. We introduce a benchmark suite and a tool to study performance of training deep learning models on mobile devices, from the perspectives of memory consumption, hardware utilization, and power consumption. The tool can correlate performance results with fine-grained operations in deep learning models, providing capabilities to capture performance variance and problems at a fine granularity. We reveal interesting performance problems and opportunities, including under-utilization of heterogeneous hardware, large energy consumption of the memory, and high predictability of workload characterization. Based on the performance analysis, we suggest interesting research directions.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[3]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[4]  Thomas F. La Porta,et al.  Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices , 2017, ACM Multimedia.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[10]  Gokcen Kestor,et al.  Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[12]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[15]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[16]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[18]  Xiao Zeng,et al.  Efficient Federated Learning via Variational Dropout , 2018 .

[19]  Gu-Yeon Wei,et al.  Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[20]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[21]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Amar Phanishayee,et al.  Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[28]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[29]  Chen Liu,et al.  Enabling Deep Learning on IoT Edge: Approaches and Evaluation , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[30]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[31]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[32]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[33]  Dong Li,et al.  Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[36]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[37]  Alexander Aiken,et al.  Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.

[38]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[39]  Lidong Zhou,et al.  Astra: Exploiting Predictability to Optimize Deep Learning , 2019, ASPLOS.