Augmenting video surveillance footage with virtual agents for incremental event evaluation

The fields of segmentation, tracking and behavior analysis demand for challenging video resources to test, in a scalable manner, complex scenarios like crowded environments or scenes with high semantics. Nevertheless, existing public databases cannot scale the presence of appearing agents, which would be useful to study long-term occlusions and crowds. Moreover, creating these resources is expensive and often too particularized to specific needs. We propose an augmented reality framework to increase the complexity of image sequences in terms of occlusions and crowds, in a scalable and controllable manner. Existing datasets can be increased with augmented sequences containing virtual agents. Such sequences are automatically annotated, thus facilitating evaluation in terms of segmentation, tracking, and behavior recognition. In order to easily specify the desired contents, we propose a natural language interface to convert input sentences into virtual agent behaviors. Experimental tests and validation in indoor, street, and soccer environments are provided to show the feasibility of the proposed approach in terms of robustness, scalability, and semantics.

[1]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[2]  François Brémond,et al.  Video-understanding framework for automatic behavior recognition , 2006, Behavior research methods.

[3]  José M. F. Moura,et al.  Capture and Representation of Human Walking in Live Video Sequences , 1999, IEEE Trans. Multim..

[4]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[5]  Pau Baiget,et al.  Generation of augmented video sequences combining behavioral animation and multi-object tracking , 2009 .

[6]  Kunio Fukunaga,et al.  Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.

[7]  Heedong Ko,et al.  "Move the couch where?" : developing an augmented reality multimodal interface , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[8]  Salvatore Gaglio,et al.  Understanding dynamic scenes , 2000, Artif. Intell..

[9]  Hilary Buxton,et al.  Learning and understanding dynamic scene activity: a review , 2003, Image Vis. Comput..

[10]  Anton Nijholt,et al.  Mixed reality participants in smart meeting rooms and smart home environments , 2007, Personal and Ubiquitous Computing.

[11]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Rainer Stiefelhagen,et al.  Multimodal Technologies for Perception of Humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR ... Papers (Lecture Notes in Computer Science) , 2007 .

[13]  Alain Colmerauer,et al.  An introduction to Prolog III , 1989, CACM.

[14]  Andrew J. Chosak,et al.  OVVV: Using Virtual Worlds to Design and Evaluate Surveillance Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ramakant Nevatia,et al.  VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[16]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[17]  Matthijs Douze,et al.  Real‐time generation of augmented video sequences by background tracking , 2006, Comput. Animat. Virtual Worlds.

[18]  Yongduek Seo,et al.  Where Are the Ball and Players? Soccer Game Analysis with Color Based Tracking and Image Mosaick , 1997, ICIAP.

[19]  Bertrand Meyer,et al.  Lessons from the design of the Eiffel libraries , 1990, CACM.

[20]  Carles Fernández Tena Understanding Image Sequences: the Role of Ontologies in Cognitive Vision , 2010 .

[21]  Maurice Milgram,et al.  Recognition of human behavior by space-time silhouette characterization , 2008, Pattern Recognit. Lett..

[22]  D. Terzopoulos,et al.  Towards intelligent camera networks: a virtual vision approach , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[23]  Jordi Gonzàlez,et al.  Background subtraction technique based on chromaticity and intensity patterns , 2008, 2008 19th International Conference on Pattern Recognition.

[24]  F. Xavier Roca,et al.  Understanding dynamic scenes based on human sequence evaluation , 2009, Image Vis. Comput..

[25]  Rita Cucchiara,et al.  ViSOR: VIdeo Surveillance On-line Repository for annotation retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[26]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[27]  K. Schäfer,et al.  “F-Limette” fuzzy logic programming integrating metric temporal extensions , 1996 .

[28]  Zhiquan Wang,et al.  Recognition of human activities using SVM multi-class classifier , 2010, Pattern Recognit. Lett..

[29]  Luc Van Gool,et al.  Multi-object tracking evaluated on sparse events , 2010, Multimedia Tools and Applications.

[30]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Hans-Hellmut Nagel,et al.  Behavioral Knowledge Representation for the Understanding and Creation of Video Sequences , 2003, KI.

[32]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Hans-Hellmut Nagel,et al.  Steps toward a Cognitive Vision System , 2004, AI Mag..

[34]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[35]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[36]  Richard Szeliski,et al.  Noise Estimation from a Single Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Minhua Ma,et al.  Visual Semantics and Ontology of Eventive Verbs , 2004, IJCNLP.

[38]  Gian Luca Foresti,et al.  On-line trajectory clustering for anomalous events detection , 2006, Pattern Recognit. Lett..

[39]  James E. Black,et al.  A novel method for video tracking performance evaluation , 2003 .

[40]  Jonathan G. Fiscus,et al.  Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers , 2008, CLEAR.

[41]  Norman I. Badler,et al.  Dynamically altering agent behaviors using natural language instructions , 2000, AGENTS '00.