3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios

In this paper we present a new framework for the online estimation of people's visual focus of attention from their head poses in dynamic meeting scenarios. We describe a voxel based approach to reconstruct the scene composition from an observer's perspective, in order to integrate occlusion handling and visibility verification. The observer's perspective is thereby simulated with live head pose tracking over four far-field views from the room's upper corners. We integrate motion and speech activity as further scene observations in a Bayesian Surprise framework to model prior attractors of attention within the situation's context. As evaluations on a dedicated dataset with 10 meeting videos show, this allows us to predict a meeting participant's focus of attention correctly in up to 72.2% of all frames.

[1]  Joris IJsselmuiden,et al.  Extending touch: towards interaction with large-scale surfaces , 2009, ITS '09.

[2]  Montse Pardàs,et al.  Multimodal real-time focus of attention estimation in SmartRooms , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Daniel Gatica-Perez,et al.  Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech , 2006, ICMI '06.

[4]  Jesse S. Jin,et al.  A Quantitative Analysis for Decomposing Visual Signal of the Gaze Displacement , 2001, VIP.

[5]  Albert Ali Salah,et al.  Resolution of focus of attention using gaze direction estimation and saliency computation , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[6]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[7]  Rainer Stiefelhagen,et al.  Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios , 2008, ICMI '08.

[8]  Jean-Marc Odobez,et al.  Visual focus of attention estimation from head pose posterior probability distributions , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[9]  Rainer Stiefelhagen,et al.  Visual Focus of Attention in Dynamic Meeting Scenarios , 2008, MLMI.

[10]  Rainer Stiefelhagen,et al.  Face Recognition in Smart Rooms , 2007, MLMI.

[11]  Jean-Marc Odobez,et al.  A Study on Visual Focus of Attention Recognition from Head Pose in a Meeting Room , 2006, MLMI.

[12]  Laurent Itti,et al.  Top-down attention selection is fine grained. , 2006, Journal of vision.

[13]  Pierre Baldi,et al.  Attention: Bits Versus Wows , 2005 .

[14]  Montse Pardàs,et al.  Head Orientation Estimation Using Particle Filtering in Multiview Scenarios , 2007, CLEAR.

[15]  Roberto Brunelli,et al.  Joint Bayesian Tracking of Head Location and Pose from Low-Resolution Video , 2007, CLEAR.

[16]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[17]  Rainer Stiefelhagen,et al.  Tracking head pose and focus of attention with multiple far-field cameras , 2006, ICMI '06.

[18]  Jean-Marc Odobez,et al.  Investigating automatic dominance estimation in groups from visual attention and speaking activity , 2008, ICMI '08.

[19]  Jean-Marc Odobez,et al.  A Cognitive and Unsupervised Map Adaptation Approach to the Recognition of the Focus of Attention from Head Pose , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[20]  Jean-Marc Odobez,et al.  Tracking the Visual Focus of Attention for a Varying Number of Wandering People , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[22]  Jean-Marc Odobez,et al.  Tracking the multi person wandering visual focus of attention , 2006, ICMI '06.

[23]  Jean-Marc Odobez,et al.  Visual activity context for focus of attention estimation in dynamic meetings , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[24]  Roberto Brunelli,et al.  An Appearance-Based Particle Filter for Visual Tracking in Smart Rooms , 2007, CLEAR.

[25]  Rainer Stiefelhagen,et al.  Tracking focus of attention in meetings , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[26]  Petr Motlícek,et al.  Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[27]  Jean-Marc Odobez,et al.  Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Rainer Stiefelhagen,et al.  A System for Probabilistic Joint 3D Head Tracking and Pose Estimation in Low-Resolution, Multi-view Environments , 2009, ICVS.

[29]  D. Sparks,et al.  Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. , 1997, Journal of neurophysiology.

[30]  Chuohao Yeo,et al.  Modeling Dominance in Group Conversations Using Nonverbal Activity Cues , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Rainer Stiefelhagen,et al.  Tracking and modeling focus of attention in meetings , 2002 .

[32]  Mary P. Harper,et al.  A Multimodal Analysis of Floor Control in Meetings , 2006, MLMI.

[33]  Rainer Stiefelhagen,et al.  Audio-visual multi-person tracking and identification for smart environments , 2007, ACM Multimedia.

[34]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).