SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

In recent years, the rapid development of edge computing enables us to process a wide variety of intelligent applications at the edge, such as real-time video analytics. However, edge computing could suffer from service outage caused by the fluctuated wireless connection or congested computing resource. During the service outage, the only choice is to process the deep neural network (DNN) inference at the local mobile devices. The obstacle is that due to the limited resource, it may not be possible to complete inference tasks on time. Inspired by the recently developedearly exit of DNNs, where we can exit DNN at earlier layers to shorten the inference delay by sacrificing an acceptable level of accuracy, we propose to adopt such mechanism to process inference tasks during the service outage. The challenge is how to obtain the optimal schedule with diverse early exit choices. To this end, we formulate an optimal scheduling problem with the objective to maximize a general overall utility. However, the problem is in the form of integer programming, which cannot be solved by a standard approach. We therefore prove the Ordered Scheduling structure, indicating that a frame arrived earlier must be scheduled earlier. Such structure greatly decreases the searching space for an optimal solution. Then, we propose the Scheduling Early Exit (SEE) algorithm based on dynamic programming, to solve the problem optimally with polynomial computational complexity. Finally, we conduct trace-driven simulations and compare SEE with two benchmarks. The result shows that SEE can outperform the benchmarks by 50.9%.

[1]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Zhou Fang,et al.  Serving deep neural networks at the cloud edge for vision applications on mobile platforms , 2019, MMSys.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yung-Hsiang Lu,et al.  Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? , 2010, Computer.

[5]  K. B. Letaief,et al.  A Survey on Mobile Edge Computing: The Communication Perspective , 2017, IEEE Communications Surveys & Tutorials.

[6]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Albert Y. Zomaya,et al.  sFog: Seamless Fog Computing Environment for Mobile IoT Applications , 2018, MSWiM.

[9]  Daehyun Kim,et al.  μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization , 2019, EuroSys.

[10]  Wenzhong Li,et al.  Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing , 2015, IEEE/ACM Transactions on Networking.

[11]  Nicholas D. Lane,et al.  Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[12]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[13]  Dusit Niyato,et al.  Offloading in Mobile Cloudlet Systems with Intermittent Connectivity , 2015, IEEE Transactions on Mobile Computing.

[14]  Min Chen,et al.  Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network , 2018, IEEE Journal on Selected Areas in Communications.

[15]  Dan Wang,et al.  Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[16]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Efficient Inference , 2017, ICML.

[17]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[18]  Mahadev Satyanarayanan,et al.  You can teach elephants to dance: agile VM handoff for edge computing , 2017, SEC.

[19]  Nikko Strom,et al.  Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.

[20]  Qun Li,et al.  Efficient service handoff across edge servers via docker container migration , 2017, SEC.

[21]  Minho Jo,et al.  Recovery for overloaded mobile edge computing , 2017, Future Gener. Comput. Syst..

[22]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[23]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[24]  Ítalo S. Cunha,et al.  Joint admission control and resource allocation in virtualized servers , 2010, J. Parallel Distributed Comput..

[25]  Zdenek Becvar,et al.  Mobile Edge Computing: A Survey on Architecture and Computation Offloading , 2017, IEEE Communications Surveys & Tutorials.

[26]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[27]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Weifa Liang,et al.  Throughput maximization for online request admissions in mobile cloudlets , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[29]  Feng Qian,et al.  DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning , 2017, IEEE Transactions on Mobile Computing.

[30]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[31]  Dan Pei,et al.  Why it takes so long to connect to a WiFi access point , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[32]  Yan Zhang,et al.  Mobile Edge Computing: A Survey , 2018, IEEE Internet of Things Journal.