Evaluating Temporal Queries Over Video Feeds

Recent advances in Computer Vision and Deep Learning made possible the efficient extraction of a schema from frames of streaming video. As such, a stream of objects and their associated classes along with unique object identifiers derived via object tracking can be generated, providing unique objects as they are captured across frames. In this paper we initiate a study of temporal queries involving objects and their co-occurrences in video feeds. For example, queries that identify video segments during which the same two red cars and the same two humans appear jointly for five minutes are of interest to many applications ranging from law enforcement to security and safety. We take the first step and define such queries in a way that they incorporate certain physical aspects of video capture such as object occlusion. We present an architecture consisting of three layers, namely object detection/tracking, intermediate data generation and query evaluation. We propose two techniques,MFS and SSG, to organize all detected objects in the intermediate data generation layer, which effectively, given the queries, minimizes the number of objects and frames that have to be considered during query evaluation. We also introduce an algorithm called State Traversal (ST) that processes incoming frames against the SSG and efficiently prunes objects and frames unrelated to query evaluation, while maintaining all states required for succinct query evaluation. We present the results of a thorough experimental evaluation utilizing both real and synthetic data establishing the trade-offs between MFS and SSG. We stress various parameters of interest in our evaluation and demonstrate that the proposed query evaluation methodology coupled with the proposed algorithms is capable to evaluate temporal queries over video feeds efficiently, achieving orders of magnitude performance benefits.

[1]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Fabian Flohr,et al.  A survey on leveraging deep neural networks for object tracking , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[4]  Kristie B. Hadden,et al.  2020 , 2020, Journal of Surgical Orthopaedic Advances.

[5]  Martin Lauer,et al.  UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  Michael A. Cusumano Self-driving vehicle technology , 2020, Commun. ACM.

[7]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[8]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Aakanksha Chowdhery,et al.  Accelerating Machine Learning Inference with Probabilistic Predicates , 2018, SIGMOD Conference.

[10]  Pat Hanrahan,et al.  Scanner: Efficient Video Analysis at Scale , 2018, ACM Trans. Graph..

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Li Bai,et al.  Minimum error bounded efficient ℓ1 tracker with occlusion detection , 2011, CVPR 2011.

[13]  Peter Bailis,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Peter Bailis,et al.  Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine , 2019, CIDR.

[16]  Peter Bailis,et al.  BlazeIt: Fast Exploratory Video Queries using Neural Networks , 2018, ArXiv.

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[19]  Mohammad Hadi Sadreddini,et al.  A sliding window based algorithm for frequent closed itemset mining over data streams , 2013, J. Syst. Softw..

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Nick Koudas,et al.  Video Monitoring Queries , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[22]  Alvin Cheung,et al.  Visual Road: A Video Data Management Benchmark , 2019, SIGMOD Conference.

[23]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[24]  Aakanksha Chowdhery,et al.  Optasia: A Relational Platform for Efficient Large-Scale Video Analytics , 2016, SoCC.

[25]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[26]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  Tim Kraska,et al.  MIRIS: Fast Object Track Queries in Video , 2020, SIGMOD Conference.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Alvin Cheung,et al.  VisualWorldDB: A DBMS for the Visual World , 2020, CIDR.

[31]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[33]  Nick Koudas,et al.  SVQ: Streaming Video Queries , 2019, SIGMOD Conference.

[34]  Ion Stoica,et al.  Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[35]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Daren Chao,et al.  SVQ++: Querying for Object Interactions in Video Streams , 2020, SIGMOD Conference.

[37]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.