Bridging the Edge-Cloud Barrier for Real-time Advanced Vision Analytics

Advanced vision analytics plays a key role in a plethora of real-world applications. Unfortunately, many of these applications fail to leverage the abundant compute resource in cloud services, because they require high computing resources and high-quality video input, but the network connections between visual sensors (cameras) and the cloud servers do not always provide sufficient and stable bandwidth to stream high-fidelity video data in real time. This paper presents CloudSeg, an edge-to-cloud framework for advanced vision analytics that co-designs the cloud-side inference with real-time video streaming, to achieve both low latency and high inference accuracy. The core idea is to send the video stream in low resolution, but recover the high-resolution frames from the low-resolution stream via a super-resolution procedure tailored for the actual analytics tasks. In essence, CloudSeg trades additional cloud-side computation (super-resolution) for significantly reduced network bandwidth. Our initial evaluation shows that compared to previous work, CloudSeg can reduce bandwidth consumption by ∼6.8× with negligible drop in accuracy.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kyung-Ah Sohn,et al.  Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.

[3]  Lingjia Tang,et al.  The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[4]  Luc Van Gool,et al.  Is image super-resolution helpful for other vision tasks? , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Dahua Lin,et al.  Low-Latency Video Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[7]  Franz Franchetti,et al.  Fast and accurate object detection in high resolution 4K and 8K video using GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[8]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Aakanksha Chowdhery,et al.  Reinventing Video Streaming for Distributed Vision Analytics , 2018, HotCloud.

[10]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[11]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[12]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[13]  Julien Mairal,et al.  BlitzNet: A Real-Time Deep Network for Scene Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Jinwoo Shin,et al.  Neural Adaptive Content-aware Internet Video Delivery , 2018, OSDI.

[15]  Edward A. Lee,et al.  AWStream: adaptive wide-area streaming analytics , 2018, SIGCOMM.

[16]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[17]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas S. Huang,et al.  Wide-activated Deep Residual Networks based Restoration for BPG-compressed Images , 2018, CVPR Workshops.

[19]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[20]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[21]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Trevor Darrell,et al.  Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[23]  Rick Salay,et al.  Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving , 2018, SAFECOMP Workshops.

[24]  Paramvir Bahl,et al.  GLIMPSE: Continuous, Real-Time Object Recognition on Mobile Devices , 2016, GetMobile Mob. Comput. Commun..

[25]  Hyeontaek Lim,et al.  Scaling Video Analytics on Constrained Edge Nodes , 2019, MLSys.

[26]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Markus H. Gross,et al.  PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[29]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.