Smart at what cost?: characterising mobile deep neural networks in the wild

With smartphones' omnipresence in people's pockets, Machine Learning (ML) on mobile is gaining traction as devices become more powerful. With applications ranging from visual filters to voice assistants, intelligence on mobile comes in many forms and facets. However, Deep Neural Network (DNN) inference remains a compute intensive workload, with devices struggling to support intelligence at the cost of responsiveness. On the one hand, there is significant research on reducing model runtime requirements and supporting deployment on embedded devices. On the other hand, the strive to maximise the accuracy of a task is supported by deeper and wider neural networks, making mobile deployment of state-of-the-art DNNs a moving target. In this paper, we perform the first holistic study of DNN usage in the wild in an attempt to track deployed models and match how these run on widely deployed devices. To this end, we analyse over 16k of the most popular apps in the Google Play Store to characterise their DNN usage and performance across devices of different capabilities, both across tiers and generations. Simultaneously, we measure the models' energy footprint, as a core cost dimension of any mobile deployment. To streamline the process, we have developed gaugeNN, a tool that automates the deployment, measurement and analysis of DNNs on devices, with support for different frameworks and platforms. Results from our experience study paint the landscape of deep learning deployments on smartphones and indicate their popularity across app developers. Furthermore, our study shows the gap between bespoke techniques and real-world deployments and the need for optimised deployment of deep learning models in a highly dynamic and heterogeneous ecosystem.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Konstantina Papagiannaki,et al.  An Empirical Study of Android Alarm Usage for Application Scheduling , 2016, PAM.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[11]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Young Geun Kim,et al.  Enhancing Energy Efficiency of Multimedia Applications in Heterogeneous Mobile Multi-Core Processors , 2017, IEEE Transactions on Computers.

[16]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[17]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[18]  Sarit Kraus,et al.  A Study of WhatsApp Usage Patterns and Prediction Models without Message Content , 2018, ArXiv.

[19]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[20]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yan Grunenberger,et al.  CHIMP: Crowdsourcing Human Inputs for Mobile Phones , 2018, WWW.

[22]  Gianluca Stringhini,et al.  A Family of Droids-Android Malware Detection via Behavioral Modeling: Static vs Dynamic Analysis , 2018, 2018 16th Annual Conference on Privacy, Security and Trust (PST).

[23]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Jing Wang,et al.  Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Ilias Leontiadis,et al.  EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices , 2019, EMDL '19.

[26]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[27]  Bahar Asgari,et al.  Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).

[28]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[29]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jing Xia,et al.  DaVinci: A Scalable Architecture for Neural Network Computing , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[31]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[32]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[33]  Luc Van Gool,et al.  AI Benchmark: All About Deep Learning on Smartphones in 2019 , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34]  Jianjun Zhao,et al.  An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  Stylianos I. Venieris,et al.  EmBench , 2019, The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19.

[36]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[37]  Valentin Bazarevsky,et al.  BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs , 2019, ArXiv.

[38]  T. Mitra,et al.  Neural Network Inference on Mobile SoCs , 2019, IEEE Design & Test.

[39]  Gabriel Synnaeve,et al.  Scaling Up Online Speech Recognition Using ConvNets , 2020, INTERSPEECH.

[40]  Qian Zhang,et al.  FasterSeg: Searching for Faster Real-time Semantic Segmentation , 2019, ICLR.

[41]  Nicholas D. Lane,et al.  HAPI: Hardware-Aware Progressive Inference , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[42]  Matthew Mattina,et al.  Searching for Winograd-aware Quantized Networks , 2020, MLSys.

[43]  Kristie B. Hadden,et al.  2020 , 2020, Journal of Surgical Orthopaedic Advances.

[44]  Ilias Leontiadis,et al.  SPINN: synergistic progressive inference of neural networks over device and cloud , 2020, MobiCom.

[45]  Nicholas D. Lane,et al.  FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout , 2021, NeurIPS.

[46]  Alexandros Kouris,et al.  Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions , 2021, EMDL@MobiSys.

[47]  Long Lu,et al.  Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps , 2020, USENIX Security Symposium.

[48]  Ilias Leontiadis,et al.  It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation , 2021, HotMobile.

[49]  Rogier C. van Dalen,et al.  Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications , 2021, ArXiv.

[50]  Samin Ishtiaq,et al.  NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition , 2021, ICLR.

[51]  Ilias Leontiadis,et al.  DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device , 2021, ACM Trans. Embed. Comput. Syst..