Enable Pipeline Processing of DNN Co-inference Tasks In the Mobile-Edge Cloud

Deep Neural Network (DNN) based artificial intelligence help driving the great development of mobile Internet. However, the hardware of a mobile device may not be sufficiently to meet the computational requirements of a DNN inference task. Fortunately, computation offloading to the network edge can mitigate part of computation pressure for mobile devices. In this case, DNN computation in mobile devices can be accelerated by an edge-assistance collaborative inference scheme. Since co-inference tasks with multiple processing stages may continuously arrive at mobile devices, only considering one DNN-based task for acceleration is not practical. To solve the above challenge effectively, we formulate the problem of multiple co-inference tasks acceleration as a pipeline execution model. Based on the model, we design a fine-grained optimizer, which integrates model partition, model early-exit and intermediate data compression, to achieve tradeoff between accuracy and latency. Considering computational characteristics of a pipeline, the goal of the optimizer is designed to ensure the pipeline system's inference rate and single task execution performance. We implement the system prototype and do benchmark tests under a real-life testbed and the results prove the effectiveness of the optimizer.

[1]  Mario Di Francesco,et al.  Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[2]  Yonggang Wen,et al.  JALAD: Joint Accuracy-And Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Olivier Richard,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE , 2018 .

[5]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[9]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[10]  Paramvir Bahl,et al.  Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xu Chen,et al.  Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy , 2018, MECOMM@SIGCOMM.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Yang Zhang,et al.  Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[15]  Saibal Mukhopadhyay,et al.  Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Dan Wang,et al.  Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[18]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..