ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Scale

Enterprises are increasingly deploying large camera networks for video analytics. Many target applications entail a common problem template: searching for and tracking an object or activity of interest (e.g. a speeding vehicle, a break-in) through a large camera network in live video. Such cross-camera analytics is compute and data intensive, with cost growing with the number of cameras and time. To address this cost challenge, we present ReXCam, a new system for efficient cross-camera video analytics. ReXCam exploits spatial and temporal locality in the dynamics of real camera networks to guide its inference-time search for a query identity. In an offline profiling phase, ReXCam builds a cross-camera correlation model that encodes the locality observed in historical traffic patterns. At inference time, ReXCam applies this model to filter frames that are not spatially and temporally correlated with the query identity's current position. In the cases of occasional missed detections, ReXCam performs a fast-replay search on recently filtered video frames, enabling gracefully recovery. Together, these techniques allow ReXCam to reduce compute workload by 8.3x on an 8-camera dataset, and by 23x - 38x on a simulated 130-camera dataset. ReXCam has been implemented and deployed on a testbed of 5 AWS DeepLens cameras.

[1]  Louis Israel Multi-Target, Multi-Camera Tracking , 2020 .

[2]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[3]  Samvit Jain,et al.  Scaling Video Analytics Systems to Large Camera Deployments , 2018, HotMobile.

[4]  Edward A. Lee,et al.  AWStream: adaptive wide-area streaming analytics , 2018, SIGCOMM.

[5]  Paramvir Bahl,et al.  VideoEdge: Processing Camera Streams using Hierarchical Clusters , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[6]  Gregory R. Ganger,et al.  Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing , 2018, USENIX Annual Technical Conference.

[7]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[9]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[10]  Xuan Zhang,et al.  Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC Project , 2017, CVPR 2017.

[11]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Paramvir Bahl,et al.  Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[13]  Venu Govindaraju,et al.  Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[15]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[16]  Yinhai Wang,et al.  Video Analytics towards Vision Zero , 2017 .

[17]  Vijayan K. Asari,et al.  Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background , 2017, J. Vis. Commun. Image Represent..

[18]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[19]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[20]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[21]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[22]  Aakanksha Chowdhery,et al.  Optasia: A Relational Platform for Efficient Large-Scale Video Analytics , 2016, SoCC.

[23]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[24]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[25]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[26]  Jiming Chen,et al.  Mobility Modeling and Prediction in Bike-Sharing Systems , 2016, MobiSys.

[27]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[29]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[30]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[31]  Aakanksha Chowdhery,et al.  The Design and Implementation of a Wireless Video Surveillance System , 2015, MobiCom.

[32]  Lin Zhong,et al.  Starfish: Efficient Concurrency Support for Computer Vision Applications , 2015, MobiSys.

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Jiajun Liu,et al.  Understanding Human Mobility from Twitter , 2014, PloS one.

[35]  Fan Zhang,et al.  Exploring human mobility with multi-source data at extremely large metropolitan scales , 2014, MobiCom.

[36]  Gérard G. Medioni,et al.  Exploring context information for inter-camera multiple target tracking , 2014, IEEE Winter Conference on Applications of Computer Vision.

[37]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[38]  Margaret Martonosi,et al.  Human mobility modeling at metropolitan scales , 2012, MobiSys '12.

[39]  Michael Arens,et al.  Person re-identification in multi-camera networks , 2011, CVPR 2011 WORKSHOPS.

[40]  Ramakant Nevatia,et al.  Inter-camera Association of Multi-target Tracks by On-Line Learned Appearance Affinity Models , 2010, ECCV.

[41]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[42]  Simone Calderara,et al.  Bayesian-Competitive Consistent Labeling for People Surveillance , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Mubarak Shah,et al.  Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views , 2008, Comput. Vis. Image Underst..

[44]  Mingyan Liu,et al.  Building realistic mobility models from coarse-grained traces , 2006, MobiSys '06.

[45]  Tim J. Ellis,et al.  Bridging the gaps between cameras , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Walid G. Aref,et al.  Video query processing in the VDBMS testbed for video database research , 2003, MMDB '03.

[47]  Kien A. Hua,et al.  Efficient and cost-effective techniques for browsing and indexing large video databases , 2000, SIGMOD '00.

[48]  Edoardo Ardizzone,et al.  JACOB: just a content-based query system for video databases , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[49]  Michael Stonebraker,et al.  Chabot: Retrieval from a Relational Database of Images , 1995, Computer.

[50]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[51]  Trigger,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015 .

[52]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[53]  Myron Flickner,et al.  Query by Image and Video Content , 1995 .