Understanding dynamic scenes based on human sequence evaluation

In this paper, a Cognitive Vision System (CVS) is presented, which explains the human behaviour of monitored scenes using natural-language texts. This cognitive analysis of human movements recorded in image sequences is here referred to as Human Sequence Evaluation (HSE) which defines a set of transformation modules involved in the automatic generation of semantic descriptions from pixel values. In essence, the trajectories of human agents are obtained to generate textual interpretations of their motion, and also to infer the conceptual relationships of each agent w.r.t. its environment. For this purpose, a human behaviour model based on Situation Graph Trees (SGTs) is considered, which permits both bottom-up (hypothesis generation) and top-down (hypothesis refinement) analysis of dynamic scenes. The resulting system prototype interprets different kinds of behaviour and reports textual descriptions in multiple languages.

[1]  Gian Luca Foresti,et al.  Ambient Intelligence: A New Multidisciplinary Paradigm , 2005 .

[2]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Tieniu Tan,et al.  Agent orientated annotation in model based visual surveillance , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  John K. Tsotsos Motion Understanding: Task-Directed Attention and Representations that Link Perception with Action , 2001, International Journal of Computer Vision.

[5]  K. Schäfer,et al.  “F-Limette” fuzzy logic programming integrating metric temporal extensions , 1996 .

[6]  Takeo Kanade,et al.  Region segmentation: Signal vs semantics , 1980 .

[7]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[8]  Y. Bar-Shalom Tracking and data association , 1988 .

[9]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Ian D. Reid,et al.  Unconstrained Multiple-People Tracking , 2006, DAGM-Symposium.

[11]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[12]  Kunio Fukunaga,et al.  Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.

[13]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[14]  Heinrich Niemann,et al.  Semantic Networks for Understanding Scenes , 1997, Advances in Computer Vision and Machine Intelligence.

[15]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[16]  Takashi Matsuyama,et al.  SIGMA: A Framework for Image Understanding - Integration of Bottom-Up and Top-Down Analysis , 1985, IJCAI.

[17]  Hans-Hellmut Nagel,et al.  Natural Language Texts for a Cognitive Vision System , 2002, ECAI.

[18]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[20]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Uwe Reyle,et al.  From discourse to logic , 1993 .

[22]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Aaron F. Bobick,et al.  Recognizing Planned, Multiperson Action , 2001, Comput. Vis. Image Underst..

[24]  Hans-Hellmut Nagel,et al.  Steps toward a Cognitive Vision System , 2004, AI Mag..

[25]  Minhua Ma,et al.  Interval Relations in Lexical Semantics of Verbs , 2004, Artificial Intelligence Review.

[26]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[27]  Sharath Pankanti,et al.  Appearance models for occlusion handling , 2006, Image Vis. Comput..

[28]  Hans-Hellmut Nagel,et al.  Incremental recognition of traffic situations from video image sequences , 2000, Image Vis. Comput..

[29]  NagelHans-Hellmut Steps toward a cognitive vision system , 2004 .

[30]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[31]  Daniel Rowe,et al.  Improving Background Subtraction Based on a Casuistry of Colour-Motion Segmentation Problems , 2007, IbPRIA.

[32]  Hans-Hellmut Nagel,et al.  (Mis?-) Using DRT for Generation of Natural Language Text from Image Sequences , 1998, ECCV.

[33]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[34]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[35]  John S. Zelek,et al.  Real-time tracking for visual interface applications in cluttered and occluding situations , 2004, Image Vis. Comput..

[36]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Jordi Gonzàlez,et al.  On Reasoning over Tracking Events , 2007, SCIA.

[38]  Jordi Gonzàlez i Sabaté Human sequence evaluation: the key-frame approach , 2005 .

[39]  Hilary Buxton,et al.  Learning and understanding dynamic scene activity: a review , 2003, Image Vis. Comput..

[40]  Fatih Murat Porikli,et al.  Achieving real-time object detection and tracking under extreme conditions , 2006, Journal of Real-Time Image Processing.

[41]  Thomas Sikora,et al.  Comparison of static background segmentation methods , 2005, Visual Communications and Image Processing.

[42]  A. David Marshall,et al.  Tracking people in three dimensions using a hierarchical model of dynamics , 2002, Image Vis. Comput..