Determining the best suited semantic events for cognitive surveillance

State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal.

[1]  Nicola Guarino,et al.  Formal ontology, conceptual analysis and knowledge representation , 1995, Int. J. Hum. Comput. Stud..

[2]  Ehud Rivlin,et al.  Surveillance Event Interpretation Using Generalized Stochastic Petri Nets , 2007, Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07).

[3]  Antonio Fernández-Caballero,et al.  Road-traffic monitoring by knowledge-driven static and dynamic image analysis , 2008, Expert Syst. Appl..

[4]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[5]  Rama Chellappa,et al.  A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video* , 2008, IEEE Transactions on Multimedia.

[6]  Rita Cucchiara,et al.  ViSOR: VIdeo Surveillance On-line Repository for annotation retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[7]  Yuncai Liu,et al.  Automatic scene detection for advanced story retrieval , 2009, Expert Syst. Appl..

[8]  H.-H. Nagel,et al.  Representation of occurrences for road vehicle traffic , 2008, Artif. Intell..

[9]  Ana M. Sánchez,et al.  A Context Model and Reasoning System to improve object tracking in complex scenarios , 2009, Expert Syst. Appl..

[10]  Luis Jiménez,et al.  A cognitive surveillance system for detecting incorrect traffic behaviors , 2009, Expert Syst. Appl..

[11]  David J. Kriegman,et al.  Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[14]  François Brémond,et al.  Video understanding for complex activity recognition , 2006, Machine Vision and Applications.

[15]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[16]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[17]  Pau Baiget,et al.  Interpretation of complex situations in a semantic-based surveillance framework , 2008, Signal Process. Image Commun..

[18]  Dong-Sik Jang,et al.  Expert system for color image retrieval , 2005, Expert Syst. Appl..

[19]  Tieniu Tan,et al.  Multi-thread Parsing for Recognizing Complex Events in Videos , 2008, ECCV.

[20]  F. Xavier Roca,et al.  Understanding dynamic scenes based on human sequence evaluation , 2009, Image Vis. Comput..

[21]  Dong-Sik Jang,et al.  Video scene change detection using neural network: Improved ART2 , 2006, Expert Syst. Appl..

[22]  François Brémond,et al.  A framework for surveillance video indexing and retrieval , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[23]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Qi Tian,et al.  Semantic retrieval of video - review of research on video retrieval in meetings, movies and broadcast news, and sports , 2006, IEEE Signal Processing Magazine.

[25]  James Orwell,et al.  Learning the Semantic Landscape: embedding scene knowledge in object tracking , 2005, Real Time Imaging.

[26]  Gian Luca Foresti,et al.  Automatic detection and indexing of video-event shots for surveillance applications , 2002, IEEE Trans. Multim..

[27]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[28]  Aura Conci,et al.  Image mining by content , 2002, Expert Syst. Appl..