VEKG: Video Event Knowledge Graph to Represent Video Streams for Complex Event Pattern Matching

Complex Event Processing (CEP) is a paradigm to detect event patterns over streaming data in a timely manner. Presently, CEP systems have inherent limitations to detect event patterns over video streams due to their data complexity and lack of structured data model. Modelling complex events in unstructured data like videos not only requires detecting objects but also the spatiotemporal relationships among objects. This work introduces a novel video representation technique where an input video stream is converted to a stream of graphs. We propose the Video Event Knowledge Graph (VEKG), a knowledge graph driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. To optimize the run-time system performance, we introduce a graph aggregation method VEKG-TAG, which provides an aggregated view of VEKG for a given time length. We defined a set of operators using event rules which can be used as a query and applied over VEKG graphs to discover complex video patterns. The system achieves an F-Score accuracy ranging between 0.75 to 0.86 for different patterns when queried over VEKG. In given experiments, pattern search time over VEKG-TAG was 2.3X faster as compared to the baseline.

[1]  Edward Curry,et al.  Towards a Generalized Approach for Deep Neural Network Based Event Processing for the Internet of Multimedia Things , 2018, IEEE Access.

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  CugolaGianpaolo,et al.  Processing flows of information , 2012 .

[5]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Frank van Harmelen,et al.  Handbook of Knowledge Representation , 2008, Handbook of Knowledge Representation.

[8]  François Brémond,et al.  A Query Language Combining Object Features and Semantic Events for Surveillance Video Retrieval , 2008, MMM.

[9]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  James F. Allen An Interval-Based Representation of Temporal Knowledge , 1981, IJCAI.

[12]  Waqar Mahmood,et al.  Internet of multimedia things: Vision and challenges , 2015, Ad Hoc Networks.

[13]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[14]  Jung-Hwan Oh,et al.  STRG-Index: spatio-temporal region graph indexing for large video databases , 2005, SIGMOD '05.

[15]  Edward Curry,et al.  Adaptive Filtering of Visual Content in Distributed Publish/Subscribe Systems , 2019, 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA).

[16]  Shashi Shekhar,et al.  Time-Aggregated Graphs for Modeling Spatio-temporal Networks , 2006, J. Data Semant..

[17]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[18]  Harald Kosch,et al.  Enabling access to Linked Media with SPARQL-MM , 2015, WWW.

[19]  Tat-Seng Chua,et al.  Video Visual Relation Detection , 2017, ACM Multimedia.

[20]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[21]  Ioan Marius Bilasco,et al.  OVIS: ontology video surveillance indexing and retrieval system , 2017, International Journal of Multimedia Information Retrieval.

[22]  Edward Curry,et al.  The Event Crowd: A Novel Approach for Crowd-Enabled Event Processing , 2017, DEBS.

[23]  Beng Chin Ooi,et al.  Spatio-temporal Event Stream Processing in Multimedia Communication Systems , 2010, SSDBM.

[24]  Daniel Hernández,et al.  Relative representation of spatial knowledge: the 2-D case , 1990, Forschungsberichte, TU Munich.

[25]  Ramakant Nevatia,et al.  VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[26]  Edward Curry,et al.  Tackling variety in event-based systems , 2015, DEBS.

[27]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[28]  Trevor Darrell,et al.  Classifying Collisions with Spatio-Temporal Action Graph Networks , 2018, ArXiv.

[29]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[30]  Benjamin Bustos,et al.  IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images , 2017, SEMWEB.

[31]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Junseok Kwon,et al.  A unified framework for event summarization and rare event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yunhuai Liu,et al.  Video structural description technology for the new generation video surveillance systems , 2015, Frontiers of Computer Science.

[34]  Richard Chbeir,et al.  MSSN-Onto: An ontology-based approach for flexible event processing in Multimedia Sensor Networks , 2020, Future Gener. Comput. Syst..

[35]  Yao Zhang,et al.  Condensing Temporal Networks using Propagation , 2017, SDM.