VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams

Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and lack of expressive query language. In this work, we focus on a CEP framework where users can define high-level expressive queries over videos to detect a range of spatiotemporal event patterns. In this context, we propose- i) VidCEP, an in-memory, on the fly, near real-time complex event matching framework for video streams. The system uses a graph-based event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data. The proposed approach detects spatiotemporal video event patterns with an F-score ranging from 0.66 to 0. S9. VidCEP maintains near real-time performance with an average throughput of 70 frames per second for 5 parallel videos with sub-second matching latency.

[1]  Özgür Ulusoy,et al.  A Database Model for Querying Visual Surveillance Videos by Integrating Semantic and Low-Level Features , 2005, Multimedia Information Systems.

[2]  Sharma Chakravarthy,et al.  Snoop: An Expressive Event Specification Language for Active Databases , 1994, Data Knowl. Eng..

[3]  Anthony G. Cohn,et al.  Learning Relational Event Models from Video , 2015, J. Artif. Intell. Res..

[4]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[5]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[6]  Ramakant Nevatia,et al.  VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[7]  V. S. Subrahmanian,et al.  Querying Video Libraries* , 1996, J. Vis. Commun. Image Represent..

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Özgür Ulusoy,et al.  BilVideo: Design and Implementation of a Video Database Management System , 2005, Multimedia Tools and Applications.

[10]  Zongda Wu,et al.  SVQL: A SQL Extended Query Language for Video Databases , 2015 .

[11]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[12]  Alessandro Margara,et al.  TESLA: a formally defined event specification language , 2010, DEBS '10.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Guillaume-Alexandre Bilodeau,et al.  Urban Tracker: Multiple object tracking in urban mixed traffic , 2014, IEEE Winter Conference on Applications of Computer Vision.

[16]  Yunhuai Liu,et al.  Video structural description technology for the new generation video surveillance systems , 2015, Frontiers of Computer Science.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  François Brémond,et al.  A Query Language Combining Object Features and Semantic Events for Surveillance Video Retrieval , 2008, MMM.

[19]  Gultekin Özsoyoglu,et al.  A Graphical Query Language: VISUAL and Its Query Processing , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[21]  Amit P. Sheth,et al.  SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries , 2011, Geospatial Semantics and the Semantic Web.

[22]  Daniel Hernández,et al.  Relative Representation of Spatial Knowledge: The 2-D Case , 1991 .

[23]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[24]  Luis Jiménez,et al.  Approximate reasoning and finite state machines to the detection of actions in video sequences , 2011, Int. J. Approx. Reason..

[25]  Dimitrios Gunopulos,et al.  Insights on a Scalable and Dynamic Traffic Management System , 2015, EDBT.

[26]  CugolaGianpaolo,et al.  Processing flows of information , 2012 .

[27]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[28]  Edward Curry,et al.  VEKG: Video Event Knowledge Graph to Represent Video Streams for Complex Event Pattern Matching , 2019, 2019 First International Conference on Graph Computing (GC).

[29]  Edward Curry,et al.  Towards a Generalized Approach for Deep Neural Network Based Event Processing for the Internet of Multimedia Things , 2018, IEEE Access.

[30]  Michael Eckert,et al.  Rule-based composite event queries: the language XChangeEQ and its semantics , 2010, Knowledge and Information Systems.

[31]  Ioan Marius Bilasco,et al.  OVIS: ontology video surveillance indexing and retrieval system , 2017, International Journal of Multimedia Information Retrieval.

[32]  Edward Curry,et al.  The Event Crowd: A Novel Approach for Crowd-Enabled Event Processing , 2017, DEBS.

[33]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[34]  Harald Kosch,et al.  Enabling access to Linked Media with SPARQL-MM , 2015, WWW.

[35]  Tat-Seng Chua,et al.  Video Visual Relation Detection , 2017, ACM Multimedia.

[36]  Vincent Oria,et al.  MOQL: A Multimedia Object Query Language , 2000 .

[37]  Benjamin Bustos,et al.  IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images , 2017, SEMWEB.

[38]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[39]  Jung-Hwan Oh,et al.  STRG-Index: spatio-temporal region graph indexing for large video databases , 2005, SIGMOD '05.

[40]  Peter Bailis,et al.  BlazeIt: Fast Exploratory Video Queries using Neural Networks , 2018, ArXiv.

[41]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[42]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[44]  Richard Chbeir,et al.  MSSN-Onto: An ontology-based approach for flexible event processing in Multimedia Sensor Networks , 2020, Future Gener. Comput. Syst..

[45]  Arbee L. P. Chen,et al.  A content-based query language for video databases , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[46]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[47]  Roger S. Barga,et al.  Event Correlation and Pattern Detection in CEDR , 2006, EDBT Workshops.

[48]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[49]  Walid G. Aref,et al.  Video query processing in the VDBMS testbed for video database research , 2003, MMDB '03.