Exploiting Multimodal Interaction Techniques for Video-Surveillance

In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes.

[1]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[2]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[3]  Hans-Hellmut Nagel,et al.  (Mis?-) Using DRT for Generation of Natural Language Text from Image Sequences , 1998, ECCV.

[4]  Bernd Neumann,et al.  Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.

[5]  Robert A. Wilson,et al.  Book Reviews: The MIT Encyclopedia of the Cognitive Sciences , 2000, CL.

[6]  Robert Dale,et al.  Building Natural Language Generation Systems: Figures , 2000 .

[7]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[8]  Rudolf Kruse,et al.  KI 2003: Advances in Artificial Intelligence , 2003, Lecture Notes in Computer Science.

[9]  Wilfried Brauer,et al.  Spatial Cognition III , 2003, Lecture Notes in Computer Science.

[10]  Hans-Hellmut Nagel,et al.  Behavioral Knowledge Representation for the Understanding and Creation of Video Sequences , 2003, KI.

[11]  Tieniu Tan,et al.  A hierarchical self-organizing approach for learning the patterns of motion trajectories , 2004, IEEE Trans. Neural Networks.

[12]  Stephen J. McKenna,et al.  Summarising contextual activity and detecting unusual inactivity in a supportive home environment , 2004, Pattern Analysis and Applications.

[13]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  François Brémond,et al.  Video understanding for complex activity recognition , 2006, Machine Vision and Applications.

[15]  François Brémond,et al.  Video-understanding framework for automatic behavior recognition , 2006, Behavior research methods.

[16]  Gian Luca Foresti,et al.  On-line trajectory clustering for anomalous events detection , 2006, Pattern Recognit. Lett..

[17]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Tieniu Tan,et al.  Trajectory Series Analysis based Event Rule Induction for Visual Surveillance , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Anthony G. Cohn,et al.  Modelling Scenes Using the Activity within Them , 2008, Spatial Cognition.

[20]  Mubarak Shah,et al.  Learning object motion patterns for anomaly detection and improved object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Dimitrios Makris,et al.  Intelligent Visual Surveillance: Towards Cognitive Vision Systems , 2008 .

[22]  Pau Baiget,et al.  Interpretation of complex situations in a semantic-based surveillance framework , 2008, Signal Process. Image Commun..

[23]  Benjamin Z. Yao,et al.  Learning a scene contextual model for tracking and abnormality detection , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Hans-Hellmut Nagel,et al.  Conceptual representations between video signals and natural language descriptions , 2008, Image Vis. Comput..

[25]  F. Xavier Roca,et al.  Understanding dynamic scenes based on human sequence evaluation , 2009, Image Vis. Comput..

[26]  Mohan M. Trivedi,et al.  Learning trajectory patterns by clustering: Experimental studies and comparative evaluation , 2009, CVPR.

[27]  Luc Van Gool,et al.  A distributed camera system for multi-resolution surveillance , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[28]  Pau Baiget,et al.  Augmenting video surveillance footage with virtual agents for incremental event evaluation , 2011, Pattern Recognit. Lett..

[29]  Pau Baiget,et al.  Determining the best suited semantic events for cognitive surveillance , 2011, Expert Syst. Appl..