A cognitive vision system for action recognition in office environments

The emerging cognitive vision paradigm is concerned with vision systems that evaluate, gather and integrate contextual knowledge for visual analysis. In reasoning about events and structures, cognitive vision systems should rely on multiple computations in order to perform robustly even in noisy domains. Action recognition in an unconstrained office environment thus provides an excellent testbed for research on cognitive computer vision. In this contribution, we present a system that consists of several computational modules for object and action recognition. It applies attention mechanisms, visual learning and contextual as well as probabilistic reasoning to fuse individual results and verify their consistency. Database technologies are used for information storage and an XML based communication framework integrates all modules into a consistent architecture.

[1]  Thomas S. Huang,et al.  Constructing finite state machines for fast gesture recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Alistair Cockburn,et al.  Agile Software Development , 2001 .

[4]  Monique Thonnat,et al.  Activity Recognition from Video Sequences using Declarative Models , 2000, ECAI.

[5]  Heinrich Niemann,et al.  Illumination Insensitive Template Matching with Hyperplanes , 2003, DAGM-Symposium.

[6]  Sven Wachsmuth,et al.  Evaluating integrated speech- and image understanding , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[7]  G. V. Paul,et al.  Modelling human assembly actions from observation , 1996, 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No.96TH8242).

[8]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[9]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Helge J. Ritter,et al.  Integrating Context-Free and Context-Dependent Attentional Mechanisms for Gestural Object Reference , 2003, ICVS.

[11]  David Windridge,et al.  Serial Multiple Classifier Systems Exploiting a Coarse to Fine Output Coding , 2003, Multiple Classifier Systems.

[12]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[13]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[14]  Chen Yu,et al.  Learning to recognize human action sequences , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[15]  Deb Roy,et al.  Learning visually grounded words and syntax of natural spoken language , 2000 .

[16]  Bruce A. Draper,et al.  ISR3: communication and data storage for an unmanned ground vehicle , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[17]  James L. Crowley,et al.  Vision as Process , 1995 .

[18]  H. Cruse The evolution of cognition—a hypothesis , 2003 .

[19]  Hilary Buxton,et al.  Recognising Simple Behaviours Using Time-Delay RBF Networks , 2004, Neural Processing Letters.

[20]  Mubarak Shah,et al.  Monitoring human behavior from video taken in an office environment , 2001, Image Vis. Comput..

[21]  Jannik Fritsch,et al.  Vision based recognition of gestures with context , 2003 .

[22]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[23]  Michael J. Black,et al.  A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions , 1998, ECCV.

[24]  Christian Bauckhage,et al.  Combining speech and haptics for intuitive and efficient navigation through image databases , 2003, ICMI '03.

[25]  Christian Bauckhage,et al.  An active memory as a model for information fusion , 2004 .