Caesar: cross-camera complex activity recognition

Detecting activities from video taken with a single camera is an active research area for ML-based machine vision. In this paper, we examine the next research frontier: near real-time detection of complex activities spanning multiple (possibly wireless) cameras, a capability applicable to surveillance tasks. We argue that a system for such complex activity detection must employ a hybrid design: one in which rule-based activity detection must complement neural network based detection. Moreover, to be practical, such a system must scale well to multiple cameras and have low end-to-end latency. Caesar, our edge computing based system for complex activity detection, provides an extensible vocabulary of activities to allow users to specify complex actions in terms of spatial and temporal relationships between actors, objects, and activities. Caesar converts these specifications to graphs, efficiently monitors camera feeds, partitions processing between cameras and the edge cluster, retrieves minimal information from cameras, carefully schedules neural network invocation, and efficiently matches specification graphs to the underlying data in order to detect complex activities. Our evaluations show that Caesar can reduce wireless bandwidth, on-board camera memory, and detection latency by an order of magnitude while achieving good precision and recall for all complex activities on a public multi-camera dataset.

[1]  Ramesh Govindan,et al.  Olympian: Scheduling GPU Usage in a Deep Neural Network Model Serving System , 2018, Middleware.

[2]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Song-Chun Zhu,et al.  Predicting Human Activities Using Stochastic Grammar , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Ramesh Govindan,et al.  Satyam: Democratizing Groundtruth for Machine Vision , 2018, ArXiv.

[5]  Тараса Шевченка,et al.  Quo vadis? , 2013, Clinical chemistry.

[6]  Kaiqi Huang,et al.  An Equalized Global Graph Model-Based Approach for Multicamera Object Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[8]  Harald Haelterman,et al.  Crime Script Analysis: Preventing Crimes Against Business , 2016 .

[9]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Cordelia Schmid,et al.  Actor-Centric Relation Network , 2018, ECCV.

[11]  Roberto Manduchi,et al.  A Power-Aware, Self-Managing Wireless Camera Network for, Wide Area Monitoring , 2006 .

[12]  Byung-Gon Chun,et al.  PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.

[13]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Francesco Solera,et al.  Tracking Social Groups Within and Across Cameras , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Ioan Marius Bilasco,et al.  Events Detection Using a Video-Surveillance Ontology and a Rule-Based Approach , 2014, ECCV Workshops.

[17]  Slawomir Bak,et al.  Human Re-identification Through a Video Camera Network. (Ré-identification de personne dans un réseau de cameras vidéo) , 2012 .

[18]  Song-Chun Zhu,et al.  Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing , 2017, AAAI.

[19]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[24]  Shih-Fu Chang,et al.  Online Detection of Action Start in Untrimmed, Streaming Videos , 2018, ECCV.

[25]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[26]  Xin Yao,et al.  Resource-aware configuration in smart camera networks , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Vijay Ramaraju,et al.  Energy Efficient Image Transmission In Wireless Multimedia Sensor Networks , 2014 .

[28]  Aakanksha Chowdhery,et al.  Networked Drone Cameras for Sports Streaming , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[29]  B. S. Manjunath,et al.  Actor Conditioned Attention Maps for Video Action Detection , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Marcello Pelillo,et al.  Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets , 2019, International Journal of Computer Vision.

[31]  Limin Wang,et al.  Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[33]  Xiaochen Liu,et al.  TAR: Enabling Fine-Grained Targeted Advertising in Retail Stores , 2018, MobiSys.

[34]  Andrea Cavallaro,et al.  Self-Reconfigurable Smart Camera Networks , 2014, Computer.

[35]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[37]  Silvio Savarese,et al.  Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[39]  Cees Snoek,et al.  Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Deli Zhao,et al.  Recognizing an Action Using Its Name: A Knowledge-Based Approach , 2016, International Journal of Computer Vision.

[41]  Xun Xu,et al.  Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation , 2016, ECCV.

[42]  B. S. Manjunath,et al.  Kestrel: Video Analytics for Augmented Multi-Camera Vehicle Tracking , 2018, 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI).

[43]  Mohamed R. Amer,et al.  HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos , 2014, ECCV.

[44]  Alex Bewley,et al.  Deep Cosine Metric Learning for Person Re-identification , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  S Abirami,et al.  Suspicious Human Activity Detection from Surveillance Videos , 2012 .

[46]  François Brémond,et al.  Globality–Locality-Based Consistent Discriminant Feature Ensemble for Multicamera Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andrew Zisserman,et al.  Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.