Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading

As mobile devices continuously generate streams of images and videos, a new class of mobile deep vision applications are rapidly emerging, which usually involve running deep neural networks on these multimedia data in real-time. To support such applications, having mobile devices offload the computation, especially the neural network inference, to edge clouds has proved effective. Existing solutions often assume there exists a dedicated and powerful server, to which the entire inference can be offloaded. In reality, however, we may not be able to find such a server but need to make do with less powerful ones. To address these more practical situations, we propose to partition the video frame and offload the partial inference tasks to multiple servers for parallel processing. This paper presents the design of Elf, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading. Elf employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme. We implement and evaluate Elf upon Linux and Android platforms using four commercial mobile devices and three deep vision applications with ten state-of-the-art models. The comprehensive experiments show that Elf can speed up the applications by 4.85× with saving bandwidth usage by 52.6%, while with <1% application accuracy sacrifice.

[1]  Silvio Savarese,et al.  Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary , 2019, HotEdgeVideo@MOBICOM.

[2]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[3]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andreas Geiger,et al.  MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xilin Chen,et al.  Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training , 2020, ECCV.

[8]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[9]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[10]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[11]  Zhenming Liu,et al.  DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[12]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[13]  Pieter Hintjens,et al.  ZeroMQ: Messaging for Many Applications , 2013 .

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Dipankar Raychaudhuri,et al.  Challenge: COSMOS: A city-scale programmable testbed for experimentation with advanced wireless , 2020, MobiCom.

[16]  Ion Stoica,et al.  Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[17]  Amin Vahdat,et al.  Democratizing the Network Edge , 2019, CCRV.

[18]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[20]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Fengyuan Xu,et al.  Occlumency: Privacy-preserving Remote Deep-learning Inference Using SGX , 2019, MobiCom.

[22]  Feng Qian,et al.  A First Measurement Study of Commercial mmWave 5G Performance on Smartphones , 2019, ArXiv.

[23]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[25]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[26]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[28]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[29]  Aakanksha Chowdhery,et al.  Server-Driven Video Streaming for Deep Learning Inference , 2020, SIGCOMM.

[30]  Hyeontaek Lim,et al.  Scaling Video Analytics on Constrained Edge Nodes , 2019, MLSys.

[31]  Gil Zussman,et al.  COSMOS Smart Intersection: Edge Compute and Communications for Bird's Eye Object Tracking , 2020, 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[32]  Nanning Zheng,et al.  SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dipankar Raychaudhuri,et al.  Hetero-Edge: Orchestration of Real-time Vision Applications on Heterogeneous Edge Clouds , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[35]  Aakanksha Chowdhery,et al.  The Design and Implementation of a Wireless Video Surveillance System , 2015, MobiCom.

[36]  Wei Liu,et al.  MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas Plötz,et al.  Ensembles of Deep LSTM Learners for Activity Recognition using Wearables , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[38]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wei Shen,et al.  Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes , 2016, Signal Process. Image Commun..

[40]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[41]  Kittipat Apicharttrisorn,et al.  Frugal following: power thrifty object detection and tracking for mobile augmented reality , 2019, SenSys.

[42]  Jie Liu,et al.  Glimpse: A Programmable Early-Discard Camera Architecture for Continuous Mobile Vision , 2017, MobiSys.

[43]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Junchen Jiang,et al.  Pano: optimizing 360° video streaming with a better understanding of quality perception , 2019, SIGCOMM.

[45]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[47]  Yunxin Liu,et al.  SCYLLA: QoE-aware Continuous Mobile Vision with FPGA-based Dynamic Deep Neural Network Reconfiguration , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[48]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Yufei Wang,et al.  Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics , 2020, SIGCOMM.

[53]  Bruno Volckaert,et al.  Embedded Real-Time Object Detection for a UAV Warning System , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[54]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[55]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Deliang Fan,et al.  Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[59]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Chong Xiang,et al.  Generating 3D Adversarial Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Larry S. Davis,et al.  AutoFocus: Efficient Multi-Scale Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  Yuning Jiang,et al.  FoveaBox: Beyound Anchor-Based Object Detection , 2019, IEEE Transactions on Image Processing.

[63]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[64]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[66]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[67]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[69]  Minjie Wang,et al.  Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.

[70]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[72]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[74]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[76]  Edward A. Lee,et al.  AWStream: adaptive wide-area streaming analytics , 2018, SIGCOMM.

[77]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[78]  Yu Liu,et al.  A First Look at Commercial 5G Performance on Smartphones , 2020, WWW.