Towards memory-efficient inference in edge video analytics

Video analytics pipelines incorporate on-premise edge servers to lower analysis latency, ensure privacy, and reduce bandwidth requirements. However, compared to the cloud, edge servers typically have lower processing power and GPU memory, limiting the number of video streams that they can manage and analyze. Existing solutions for memory management, such as swapping models in and out of GPU, having a common model stem, or compression and quantization to reduce the model size incur high overheads and often provide limited benefits. In this paper, we propose model merging as an approach towards memory management at the edge. This proposal is based on our observation that models at the edge share common layers, and that merging these common layers across models can result in significant memory savings. Our preliminary evaluation indicates that such an approach could result in up to 75% savings in the memory requirements. We conclude by discussing several challenges involved with realizing the model merging vision.

[1]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[4]  Paramvir Bahl,et al.  Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers , 2020, NSDI.

[5]  Silvio Savarese,et al.  Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary , 2019, HotEdgeVideo@MOBICOM.

[6]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Wencong Xiao,et al.  Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.

[8]  Joseph E. Gonzalez,et al.  Spatula: Efficient cross-camera video analytics on large camera networks , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[9]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[10]  Wencong Xiao,et al.  Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications , 2018 .

[11]  Luc Van Gool,et al.  Branched Multi-Task Networks: Deciding what layers to share , 2019, BMVC.

[12]  Wei Lin,et al.  Characterizing Deep Learning Training Workloads on Alibaba-PAI , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Hyeontaek Lim,et al.  Scaling Video Analytics on Constrained Edge Nodes , 2019, MLSys.

[14]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Gu Jin,et al.  SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping , 2020, ASPLOS.

[16]  Wencong Xiao,et al.  Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.

[17]  Purushottam Kulkarni,et al.  Dynamic Memory Management for GPU-Based Training of Deep Neural Networks , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[18]  Gregory R. Ganger,et al.  Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing , 2018, USENIX Annual Technical Conference.

[19]  Yufei Wang,et al.  Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics , 2020, SIGCOMM.

[20]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[21]  Weisong Shi,et al.  LAVEA: latency-aware video analytics on edge computing platform , 2017, SEC.

[22]  Yuanchao Shu,et al.  Traffic Video Analytics - Case Study Report , 2020 .

[23]  Peter Bailis,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[24]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[25]  Mahadev Satyanarayanan,et al.  Towards scalable edge-native applications , 2019, SEC.

[26]  Rameswar Panda,et al.  AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , 2020, NeurIPS.

[27]  Aakanksha Chowdhery,et al.  The Design and Implementation of a Wireless Video Surveillance System , 2015, MobiCom.

[28]  Paramvir Bahl,et al.  VideoEdge: Processing Camera Streams using Hierarchical Clusters , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).