论文信息 - DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency

DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency

The booming of Convolutional Neural Networks (CNNs) has empowered lots of computer-vision applications. Due to its stringent requirement for computing resources, substantial research has been conducted on how to optimize its deployment and execution on resource-constrained devices. However, previous works have several weaknesses, including limited support for various CNN structures, fixed scheduling strategies, overlapped computations, high synchronization overheads, etc. In this article, we present DeepSlicing, a collaborative and adaptive inference system that adapts to various CNNs and supports customized flexible fine-grained scheduling. As a built-in functionality, DeepSlicing has supported typical CNNs including GoogLeNet, ResNet, etc. By partitioning both model and data, we also design an efficient scheduler, Proportional Synchronized Scheduler (PSS), which achieves the trade-off between computation and synchronization. Based on PyTorch, we have implemented DeepSlicing on the testbed with real-world edge settings that consists of 8 heterogeneous Raspberry Pi's. The results indicate that DeepSlicing with PSS outperforms the existing systems dramatically, e.g., the inference latency and memory footprint are reduced up to 5.79× and 14.72×, respectively.

[1] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2] Xu Chen,et al. Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[3] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[4] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Paramvir Bahl,et al. Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[6] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[7] Andreas Gerstlauer,et al. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8] Yiran Chen,et al. MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[9] Soo-Mook Moon,et al. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers , 2018, SoCC.

[10] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[11] Tarek F. Abdelzaher,et al. FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices , 2018, SenSys.

[12] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13] Dan Wang,et al. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[14] Andreas Gerstlauer,et al. Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices , 2019, SAMOS.

[15] Amos Storkey,et al. A Closer Look at Structured Pruning for Neural Network Compression , 2018 .

[16] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Paramvir Bahl,et al. Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[18] Saibal Mukhopadhyay,et al. Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[19] Song Guo,et al. Joint DNN Partition Deployment and Resource Allocation for Delay-Sensitive Deep Learning Inference in IoT , 2020, IEEE Internet of Things Journal.

[20] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21] Feng Qian,et al. Enabling Cooperative Inference of Deep Learning on Wearables and Smartphones , 2017, ArXiv.

[22] E. Baccarelli,et al. Why Should We Add Early Exits to Neural Networks? , 2020, Cognitive Computation.

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Mianxiong Dong,et al. Deep Learning for Smart Industry: Efficient Manufacture Inspection System With Fog Computing , 2018, IEEE Transactions on Industrial Informatics.

[25] Mahadev Satyanarayanan,et al. The Emergence of Edge Computing , 2017, Computer.

[26] Ion Stoica,et al. Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Michael S. Ryoo,et al. Collaborative Execution of Deep Neural Networks on Internet of Things Devices , 2019, ArXiv.

[29] Ming Yang,et al. Reducing Response-Time Bounds for DAG-Based Task Systems on Heterogeneous Multicore Platforms , 2016, RTNS.

[30] Bhaskar Krishnamachari,et al. Fast and Accurate Streaming CNN Inference via Communication Compression on the Edge , 2020, 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI).

[31] Mohammad Hossein Samavatian,et al. Adaptive parallel execution of deep neural networks on heterogeneous edge devices , 2019, SEC.

[32] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Qingxu Deng,et al. Federated scheduling for Typed DAG tasks scheduling analysis on heterogeneous multi-cores , 2020, J. Syst. Archit..

[34] Arijit Mukherjee,et al. Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights , 2019, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).