Representation spaces in a visual-based human action recognition system

Visual tracking consists of locating or determining the configuration of a known object at each frame of a video sequence. Usually, the description of the whole scene involves the participation of multiple targets, their movements and interactions, etc., and the scenario particular features. This paper presents a visual tracking system framework oriented to provide a ''near natural language'' description of the involved targets in the scene actions. Our prototype focuses on the detection, tracking and feature extraction of a dynamic number of targets in a scenario along time. The design of any visual tracking system usually needs the injection of human knowledge at each transformed level of description, in order to produce from raw videos a linguistic scene summary. The main aim of this work was to make explicit the knowledge injection needed to link the low-level representations (associated to signals) to the high-level semantics (related to knowledge) in the visual tracking problem. As a result, the emerging semantic necessary at the two transformation level is analysed and presented. We have concentrated on the representation spaces for the memetic algorithm particle filter applied to multiple object tracking in annotated scenarios, oriented to video-based surveillance applications. Finally, some example applications on different surveillance scenarios are presented and discussed.

[1]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[2]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[3]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[8]  Pablo Moscato,et al.  Memetic algorithms: a short introduction , 1999 .

[9]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Rama Chellappa,et al.  View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Juan José Pantrigo,et al.  2D Human Tracking by Efficient Model Fitting Using a Path Relinking Particle Filter , 2004, AMDO.

[12]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[13]  José Mira Mira,et al.  On the correspondence between objects and events for the diagnosis of situations in visual surveillance tasks , 2008, Pattern Recognit. Lett..

[14]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[15]  Sebastian Thrun,et al.  Particle Filters in Robotics , 2002, UAI.

[16]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[17]  M. Shah,et al.  Exploring the Space of an Action for Human Action Recognition , 2005 .

[18]  Csaba Szepesvari,et al.  LS-N-IPS: An Improvement of Particle Filters by Means of Local Search , 2001 .

[19]  Riccardo Poli,et al.  New ideas in optimization , 1999 .

[20]  José Mira Mira,et al.  On how the computational paradigm can help us to model and interpret the neural function , 2007, Natural Computing.

[21]  Hans-Hellmut Nagel,et al.  Conceptual representations between video signals and natural language descriptions , 2008, Image Vis. Comput..

[22]  Juan José Pantrigo,et al.  Multi-dimensional visual tracking using scatter search particle filter , 2008, Pattern Recognit. Lett..