Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

Today's mobile applications are increasingly leveraging deep neural networks to provide novel features, such as image and speech recognitions. To use a pre-trained deep neural network, mobile developers can either host it in a cloud server, referred to as cloud-based inference, or ship it with their mobile application, referred to as on-device inference. In this work, we investigate the inference performance of these two common approaches on both mobile devices and public clouds, using popular convolutional neural networks. Our measurement study suggests the need for both on-device and cloud-based inferences for supporting mobile applications. In particular, newer mobile devices is able to run mobile-optimized CNN models in reasonable time. However, for older mobile devices or to use more complex CNN models, mobile applications should opt in for cloud-based inference. We further demonstrate that variable network conditions can lead to poor cloud-based inference end-to-end time. To support efficient cloud-based inference, we propose a CNN model selection algorithm called CNNSelect that dynamically selects the most appropriate CNN model for each inference request, and adapts its selection to match different SLAs and execution time budgets that are caused by variable mobile environments. The key idea of CNNSelect is to make inference speed and accuracy trade-offs at runtime using a set of CNN models. We demonstrated that CNNSelect smoothly improves inference accuracy while maintaining SLA attainment in 88.5% more cases than a greedy baseline.

[1]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[2]  T. Beardsley Model zoo , 1984, Nature.

[3]  Christopher Olston,et al.  TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[4]  Tian Guo,et al.  Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference , 2017, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[5]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Niranjan Balasubramanian,et al.  DeQA: On-Device Question Answering , 2019, MobiSys.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Rajesh Krishna Balan,et al.  DeepSense: A GPU-based Deep Convolutional Neural Network Framework on Commodity Mobile Devices , 2016, WearSys '16.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[13]  Amit Samanta,et al.  No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy , 2019, ArXiv.

[14]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[15]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[19]  Olatunji Ruwase,et al.  Efficient Deep Neural Network Serving: Fast and Furious , 2018, IEEE Transactions on Network and Service Management.

[20]  Lingfan Yu,et al.  Low latency RNN inference with cellular batching , 2018, EuroSys.

[21]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[22]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23]  Alan Messer,et al.  Adaptive offloading inference for delivering applications in pervasive computing environments , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[24]  Lucas Chaufournier,et al.  Containers and Virtual Machines at Scale: A Comparative Study , 2016, Middleware.

[25]  Alec Wolman,et al.  MAUI: making smartphones last longer with code offload , 2010, MobiSys '10.

[26]  Wei Wang,et al.  MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.

[27]  Prashant J. Shenoy,et al.  Latency-aware virtual desktops optimization in distributed clouds , 2017, Multimedia Systems.

[28]  Samuel S. Ogden,et al.  MODI: Mobile Deep Inference Made Efficient by Edge Computing , 2018, HotEdge.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[32]  Soheil Ghiasi,et al.  CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.

[33]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Mohak Shah,et al.  Comparative Study of Deep Learning Software Frameworks , 2015, 1511.06435.

[35]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[36]  S. G. Ponnambalam,et al.  Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task , 2012 .

[37]  Michael I. Jordan,et al.  Characterizing, modeling, and generating workload spikes for stateful services , 2010, SoCC '10.

[38]  Claus Nebauer,et al.  Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[39]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[40]  Niranjan Balasubramanian,et al.  MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU , 2017, EMDL '17.

[41]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[42]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[43]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[44]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Mikko Haavisto Pretraining Convolutional Neural Networks for Visual Recognition , 2016 .

[46]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.