Using Sparse Visual Data to Model Human Activities in Meetings

We have recently engaged on the challenging development of an agent to assist users in everyday office-related tasks. In particular, the agent needs to keep track of the state of their users so it can anticipate the user’s needs and proactively address them. The state of the user may be easily available when the user directly interacts with their agent through a PC or PDA interface. However, when the user attends a meeting and interacts with other people, PC and PDA interfaces are not sufficient to give the agents a general view of the environment in which their users are interacting. In this paper, we introduce the CAMEO, the Camera Assisted Meeting Event Observer, which is a physical awareness system designed for use by an agent-based electronic assistant. We then present a particular aspect of CAMEO and main contribution of the paper, namely how CAMEO addresses the problem of extracting and reasoning about high-level features from real-time and continuous observation of a meeting environment. Contextual information about meetings and the interactions that take place with them is used to define Dynamic Bayesian Network classifiers to effectively infer the state of the users as well as a higher-level state of the meeting. We present and show results of the state inference algorithm.

[1]  Henry A. Kautz A formal theory of plan recognition , 1987 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[4]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Manuela Veloso,et al.  Automated Robot Behavior Recognition Applied to Robotic Soccer , 1999 .

[6]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[7]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[8]  Manuela M. Veloso,et al.  Learning the Sequential Coordinated Behavior of Teams from Observations , 2002, RoboCup.

[9]  Svetha Venkatesh,et al.  Recognizing and monitoring high-level behaviors in complex spatial environments , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Brett Browning,et al.  Multi-robot team response to a multi-robot opponent team , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[11]  Yan Huang,et al.  ARGMode - Activity Recognition using Graphical Models , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Brett Browning,et al.  ÜberSim: a multi-robot simulator for robot soccer , 2003, AAMAS '03.

[13]  Brett Browning,et al.  CAMEO: Camera Assisted Meeting Event Observer , 2007 .

[14]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.