论文信息 - Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing

Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing

Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data and match high-level event patterns. Presently, CEP is limited to process structured data stream. Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them. This work introduces a graph-based structure for continuous evolving video streams, which enables the CEP system to query complex video event patterns. We propose the Video Event Knowledge Graph (VEKG), a graph driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. A CEP-based state optimization - VEKG-Time Aggregated Graph (VEKG-TAG) is proposed over VEKG representation for faster event detection. VEKG-TAG is a spatiotemporal graph aggregation method that provides a summarized view of the VEKG graph over a given time length. We defined a set of nine event pattern rules for two domains (Activity Recognition and Traffic Management), which act as a query and applied over VEKG graphs to discover complex event patterns. To show the efficacy of our approach, we performed extensive experiments over 801 video clips across 10 datasets. The proposed VEKG approach was compared with other state-of-the-art methods and was able to detect complex event patterns over videos with F-Score ranging from 0.44 to 0.90. In the given experiments, the optimized VEKG-TAG was able to reduce 99% and 93% of VEKG nodes and edges, respectively, with 5.19X faster search time, achieving sub-second median latency of 4-20 milliseconds.

[1] Ramakant Nevatia,et al. Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Rached Tourki,et al. Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification , 2013, J. Electronic Imaging.

[3] DayalUmeshwar,et al. The architecture of an active database management system , 1989 .

[4] Edward Curry,et al. Tackling variety in event-based systems , 2015, DEBS.

[5] A AlviSheeraz,et al. Internet of multimedia things , 2015, AdHocNets 2015.

[6] Dietrich Paulus,et al. Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[7] Edward Curry,et al. VEKG: Video Event Knowledge Graph to Represent Video Streams for Complex Event Pattern Matching , 2019, 2019 First International Conference on Graph Computing (GC).

[8] Prashant J. Shenoy,et al. Supporting Scalable Analytics with Latency Constraints , 2015, Proc. VLDB Endow..

[9] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10] Viktor K. Prasanna,et al. Understanding web images by object relation network , 2012, WWW.

[11] Jonathan Tompson,et al. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[12] Ramakant Nevatia,et al. VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[13] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[14] Benjamin Bustos,et al. IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images , 2017, SEMWEB.

[15] Jung-Hwan Oh,et al. STRG-Index: spatio-temporal region graph indexing for large video databases , 2005, SIGMOD '05.

[16] Piyush Yadav. High-performance complex event processing framework to detect event patterns over video streams , 2019, Middleware Doctoral Symposium.

[17] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Junseok Kwon,et al. A unified framework for event summarization and rare event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Ramesh C. Jain,et al. Situation Recognition Using EventShop , 2016, Springer International Publishing.

[20] Jennifer Widom,et al. Models and issues in data stream systems , 2002, PODS.

[21] Volker Markl,et al. A survey of state management in big data processing systems , 2017, The VLDB Journal.

[22] Anthony G. Cohn,et al. Qualitative Spatial Representation and Reasoning: An Overview , 2001, Fundam. Informaticae.

[23] Yuan-Chi Chang,et al. Event detection in sensor networks for modern oil fields , 2008, DEBS.

[24] Edward Curry,et al. Towards a Generalized Approach for Deep Neural Network Based Event Processing for the Internet of Multimedia Things , 2018, IEEE Access.

[25] Kerry L. Taylor,et al. Ontology-Driven Complex Event Processing in Heterogeneous Sensor Networks , 2011, ESWC.

[26] Harald Kosch,et al. Enabling access to Linked Media with SPARQL-MM , 2015, WWW.

[27] Lan Chen,et al. Semantic based representing and organizing surveillance big data using video structural description technology , 2015, J. Syst. Softw..

[28] Wei Zhang,et al. Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[29] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Larry S. Davis,et al. Collective Activity Detection Using Hinge-loss Markov Random Fields , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31] Dimitris Samaras,et al. Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[32] Muhammad Haroon Yousaf,et al. Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description , 2016, IET Comput. Vis..

[33] Yanlei Diao,et al. High-performance complex event processing over streams , 2006, SIGMOD Conference.

[34] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[35] Tat-Seng Chua,et al. Video Visual Relation Detection , 2017, ACM Multimedia.

[36] Michael Jones,et al. Street Scene: A new dataset and evaluation protocol for video anomaly detection , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37] Eliseo Clementini,et al. A Small Set of Formal Topological Relationships Suitable for End-User Interaction , 1993, SSD.

[38] Shashi Shekhar,et al. Time-Aggregated Graphs for Modeling Spatio-temporal Networks , 2006, J. Data Semant..

[39] Ramesh Jain,et al. Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[40] Ioan Marius Bilasco,et al. OVIS: ontology video surveillance indexing and retrieval system , 2017, International Journal of Multimedia Information Retrieval.

[41] Edward Curry,et al. The Event Crowd: A Novel Approach for Crowd-Enabled Event Processing , 2017, DEBS.

[42] Larry S. Davis,et al. AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[43] James F. Allen. An Interval-Based Representation of Temporal Knowledge , 1981, IJCAI.

[44] Krysia Broda,et al. SAGE: A Logical Agent-Based Environment Monitoring and Control System , 2009, AmI.

[45] P. Fearnhead,et al. Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[46] Ming-Hsuan Yang,et al. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[47] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[48] Beng Chin Ooi,et al. Spatio-temporal Event Stream Processing in Multimedia Communication Systems , 2010, SSDBM.

[49] Arcot Sowmya,et al. CogVis: attention-driven cognitive architecture for visual change detection , 2017, SAC.

[50] Daniel Hernández,et al. Relative representation of spatial knowledge: the 2-D case , 1990, Forschungsberichte, TU Munich.

[51] Alessandro Margara,et al. Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[52] Johannes Gehrke,et al. Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[53] Umeshwar Dayal,et al. The architecture of an active database management system , 1989, SIGMOD '89.

[54] Edward Curry,et al. VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[55] Trevor Darrell,et al. Classifying Collisions with Spatio-Temporal Action Graph Networks , 2018, ArXiv.

[56] Alessandro Margara,et al. The Complex Event Processing Paradigm , 2015, Data Management in Pervasive Systems.

[57] Anthony G. Cohn,et al. Learning Relational Event Models from Video , 2015, J. Artif. Intell. Res..

[58] F. van Harmelen,et al. Handbook of Knowledge Representation 1 , 2008 .

[59] Hossein Ragheb,et al. MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[60] Shyamanta M. Hazarika,et al. Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video , 2017, Applied Intelligence.

[61] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62] Richard Chbeir,et al. MSSN-Onto: An ontology-based approach for flexible event processing in Multimedia Sensor Networks , 2020, Future Gener. Comput. Syst..

[63] Yao Zhang,et al. Condensing Temporal Networks using Propagation , 2017, SDM.

[64] D. Luckham. The Power of Events , 2002 .

[65] Waqar Mahmood,et al. Internet of multimedia things: Vision and challenges , 2015, Ad Hoc Networks.

[66] Li Fei-Fei,et al. Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.